


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXX December, 1939 Number 9 


— a oe 
— ——— 


STUDIES IN RETENTIO 
HERBERT F. SPITZER 
State University of Iowa* 

















The present investigation was planned with special reference to 
(1) the effect of recall on retention, (2) the relationship between the 
rate of forgetting and the ability of the subjects, and (3) the effect of 
item difficulty on the form of retention curves. 


PROBLEM 


The importance of retention is shown by the fact that growth or 
improvement in skills, knowledges, and attitudes is dependent upon 
the learner’s retention of the effects of previous experience. The use 
of recall as an aid to retention has been emphasized by theorists on 
methods of study and by investigators in the field of memory. The 
lack of experimental evidence on the effect of recall on retention where 
conditions approach those of actual schoolroom practice prompted this 
investigation. ‘The primary purpose of the investigation was to deter- 
mine the effect of recall on retention of facts which children acquire 
through reading when the materials and the methods of study are 
similar to those used in classroom situations. ‘Two subsidiary prob- 
lems of the investigation relate to the effect of item difficulty upon the 
form of retention curves and to the relationship between the learning 
ability of the students and the rate of forgetting. 

Related Research.—A number of experimental studies have dealt 
with various aspects of the problem of this investigation. Myers® 
found that immediate recall in the form of written reproduction was 
beneficial to later reproduction of a list of unrelated words. An indi- 
rect or incidental method of learning the words had been employed. 





* This article reports, in part, an investigation conducted by the writer as a 
partial fulfillment of the requirements for the Ph.D. degree in the College of 
Education of the State University of Iowa. The assistance of Dr. Ernest Horn, 
under whose direction the study was conducted, is gratefully acknowledged. 

641 . 








ee 


4 

Sh 
xi 
#, 


642 The Journal of Educational Psychology 


Gates? found that some recall in the form of recitation was an aid in 
memorizing biographical prose. This was true when tests were given 
immediately and also after four hours. Gates also reports finding a 
high positive correlation between immediate and delayed recall, 
Yoakam’ concluded that an immediate recall in the form of a test was 
of more value to retention than was a single reading. Jones* found 
that recall tests aided the retention of information acquired from 
lectures by college students. Other studies which are related to some 
phases of this study are reported by Dietze,' Keys,‘ Raffel,* and 
Young? 


SUBJECTS AND MATERIAL 


The subjects for this experiment were thirty-six hundred five 
sixth-grade pupils of nine Iowa cities. This was the entire sixth-grade 
population of ninety-one elementary schools. These schools were 
arbitrarily divided into ten groups. Groups I, II, III, IV, V, VI, VII, 
and VIII, each comprising approximately four hundred children, were 
used to obtain the data for the main part of the experiment. Group IX 
was used to obtain data on the effect of immediate repetition of tests, 
and Group X was used to obtain information on previous knowledge. 
Thus Groups IX and X were, in a sense, used as control groups. 

The reading materials used in this study were two articles, six 
hundred four and five hundred seventy-seven words in length, printed 
in a six-page folder. The first of these articles entitled ‘“‘ Peanuts,” 
designated A, and the accompanying test, were used as a sectioning 
device; while the second article, entitled ‘‘Bamboos,’’ designated B, 
and the accompanying test, were used to obtain the data on retention. 
An attempt was made to write articles with content that would be 
relatively new to the children, highly factual, authentic, of the proper 
difficulty, and similar in type to the material that children read in their 
regular school work. Both of the topics dealt with are treated briefly 
if at all in geography books used by the children. The first paragraph 
of Article B and five of the items of Test B, which are wholly or partially 
based on the content of this paragraph, are given below. 


Near Savannah, Georgih. is the Plant Introduction Garden of the United 
States Department of Agriculture. In one section of this garden are the 
bamboos. These plants are members of the grass family. They resemble 
their relatives, corn and wheat, in structure of stem which is rounded, divided 
into joints, and more or less hollow. Bamboos also resemble pines and 
spruces by having tall, straight trunks and cone shaped heads or crowns, and 








Studies in Retention 643 


by being evergreens. Although they are evergreens, a new set of leaves 


appears each spring. On rare occasions bamboos flower and produce seeds. 
After flowering, the plant usually dies. 


1. Who maintains an experimental garden near Savannah, Georgia? 
( ) The Bamboo Growers Association ( ) The U. S. Government ( ) 
The U. S. Custom Office ( ) The florists of Savannah ( ) The State of 
Georgia. 

2. To which family of plants do bamboos belong? ( ) trees( ) ferns 
( ) grasses ( ) mosses ( ) fungi. 

4. Which two trees do the bamboos resemble most? (_) royal and date 


palms ( ) willow and tamarack ( ) white oak and burch ( ) walnut and 
hickory ( ) pine and spruce. 


18. How often do bamboos produce seed? ( ) every spring ( ) never 
( ) on rare occasions ( ) every other year ( ) every third year. 

19. What usually happens to a bamboo plant after the flowering period? 
( ) It dies ( ) It begins a new growth ( ) It sends up new plants from 
the roots ( ) It begins to branch out (_ ) It begins to grow a rough bark. 


A twenty-five item test (see example above) on each article was 
used to measure retention. Through an item analysis of each test, the 
difficulty and discriminating power (tetrachoric ‘‘r’’) of each item 
were Obtained. Items too difficult or of poor discriminating power 
were eliminated after the preliminary trials, leaving a total of twenty- 
five items in each test. The reliability of the two tests determined by 
equal halves technique was .77 and .80, respectively. The correlation 
between Test A and Test B based on scores of six hundred twenty-two 


pupils who took both tests immediately after reading the articles 
was .76. 


PROCEDURE 


Both the reading material and the tests were presented to the 
pupils by their regular teachers according to directions supplied by 
the investigator. In the printed directions given at the beginning of 
the experiment, the children were told that they were taking part in a 
learning experiment. They were also told that they were to try to 
remember the information given in the articles. Similar directions 
were printed at the top of each article. 

On the first day of the experiment, all the children in Groups I-VIII 
read Article A and took Test A. -They also read Article B, but only 
Groups I and II took Test B;.* The other six groups took a test 





* The first time pupils took Test B, it was labeled B:; the second time they took 
Test B, it was labeled B:; and the third time, it was labeled B;. 








3 if 
Sg 








The Journal of Educational Psychology 


Taste I.—D1acram oF PROCEDURE 





Time in days 


1 


7 


14 


21 





Groups 


I 

II 
Ill 
IV 
Vv 
VI 
VII 
Vill 





B, 
B, 





B: 


By 





Bs 


By 





Bs 


By 





B, 
B; 


By 





B; 


By 





B, 


By 


B, 








which had little relation to the content of Article B. This test was 
given in order to keep them from expecting a later test. The real 
Test B was given to these six groups (III-VIII) at varying time 
intervals after the start of the experiment. A diagram of the testing 
procedure followed is given in Table I. This table shows that the 
pupils of Group I took Test B; immediately after reading Article B, 
repeated the test (Test B.) after one day, and again repeated the same 
test (Test B;) after twenty-one days. The pupils of Group IV took 
Test B, seven days after reading Article B and repeated the test 
(Test B.) twenty-one days after reading the article. The procedure 
followed by the other six groups is shown in the table. 

The reading materials were not referred to after the initial study 
period, nor did the pupils know that there were to be delayed tests. 
The teachers were instructed not to discuss the articles or the tests 
with the pupils. Pupils were given eight minutes to read the articles 
and ten minutes for each of the tests. 

The pupils of Group [X read both articles and took both tests on 
the first day of the experiment. They took Test B a second time 
immediately after completing the first attempt. For this second test 
they were instructed to try to improve their first score. This pro- 
cedure was followed for the purpose of obtaining information on the 
effect of repetition of the tests. The pupils of Group X took Test B 


without having read Article B, for the purpose of obtaining data on 
previous knowledge. 


RESULTS 


Effect of Recall.—F rom a single frequency distribution of the scores 
on Test A, the mean score and the standard deviation of scores of the 


bt 


SpESFASSBEBE 





m~a Ww OO FO Oe hel 


Studies in Retention 645 


population studied were determined. The means and standard 
deviations of the scores of the pupils on Test A in each of Groups I-VIII 
were then calculated. A comparison of these means with the mean 
for the entire population showed that each of the groups was practically 
representative of the population. However, some pupil scores were 
deleted from two of the groups in order to make the groups more 
nearly equal. The means of these equated groups are given in 
Table II. The critical ratio of the largest difference between means 
on Test A of any two of these equalized groups is only .18. 


TasLe II.—Tzgst B Resvurs or THE Groups THAT Mapes EQurIvaLEent Scores On 

















Test A 

Group N M test A Test* M test B SD test B 
I 286 15.03 B, 13.23 4.69 
I 7 i shes B; 13.07 4.57 
I me ee B, 12.18 4.59 
me 338 15.05 B, 13.20 4.50 
II a, a. owe ee B; 11.84 4.64 
II ee oie B, 10.74 4.22 
III 367 15.00 B, 9.56 4.24 
III Se eee B; 8.93 4.06 
IV 337 15.00 B, 7.87 3.56 
IV ss eS ae: B; 8.15 3.83 
V 371 15.04 B, 6.97 3.53 
V ek EO eb a B; 7.10 3.21 
VI 379 15.04 B, 6.49 2.91 
VI ee Bs ei B; 7.07 3.08 
VII 365 15.00 B, 6.80 3.03 
VIII 350 15.03 B, 6.38 2.71 











* For identification of various Test B’s and the time after learning that each 
was taken, see Table I. 


The mean scores of all groups of pupils who took Test B are shown 
in Table II. In interpreting the data of this table the assumption is 
made that the eight groups of pupils profited equally from the reading 
and that the groups possessed equal ability to retain the effects of the 
reading. ‘This assumption is based on the fact that the groups were 
equated on the sectioning test (Test A). According to the assumption 
stated above, Group I would have made a mean score of approximately 
9.56 (mean score of Group III on Test B,) one day after reading the 
article if-the group had not been given the immediaterecall test. This 
last statement is based on the fact that Groups I and III were originally 





ore 








646 The Journal of Educational Psychology 


equal. Therefore, had Group I delayed taking Test B until one day 
after reading, the mean score made by the group would have been 
the same as that made by Group III, or 9.56. On the same basis, had 
Groups III, IV, V, VI, VII, and VIII been tested immediately after read- 
ing Article B, each group would have made a mean score of approxi- 
mately 13.22. Thedata of this table show that more is forgotten in one 
day without recall than is forgotten in sixty-three days with the aid of 
recall, as is shown by a comparison of the scores of II B; and VIII B,. 
The differences between originally comparable groups and now differing 
by only one recall and the critical ratio of these differences are shown 
in Table III. These critical ratios would have been larger had the 
standard error of difference formula for matched groups been used.* 
This formula was not used because a statistically significant difference 
was obtained without its use. The data summarized in Tables II 
and III show clearly that retention benefited significantly by recall. 
The effects of recall on retention are shown graphically in the 
figures (see Figs. 1, 2,3, and 4). The points on the graph represent the 
mean performance of the different groups. For example, the “ Point 
III B,’’ represents the mean score of the pupils of Group III on their 


Taste III].—Comparison or Test B Resvutts ror Groups DirFreERING BY ONLY 








One RECALL 
Group Test AM SE, AM | Difference a CR 
erence 
I B; 13.07 .27 
III B, 9.56 .22 3.51 .346 10.10 
I B; 12.18 .28 
IV B; 8.15 .21 4.03 . 350 11.51 
II B; 11.84 25 
IV B, 7.87 .19 3.97 .317 12.50 
II B, 10.74 .24 
VI By 7.07 .16 3.67 . 288 12.74 
III B; 8.93 .22 
V B, 6.97 18 1.96 . 286 6.85 
IV B, 8.15 .21 
VI B, 6.49 .16 1.66 . 264 6.29 
V B, “e .18 
VII B, 6. .16 .30 . 241 1.25 


























* Lindquist, E. F. and Wilks, S. 8.: “The Significance of a Difference between 
Matched Groups.” J. Educ. Psycho., Vol. xxu, 1931, pp. 197-204, 205-208. 





Studies in Retention 647 


first attempt on Test B. This trial was taken one day after reading 
Article B. Since the eight groups were equalized according to their 
performance on Test A, which was quite similar to Test B, it is 
assumed that the facts which Groups I and II acquired through reading 
would have been forgotten at the rate shown by the solid line if they 
had not been given the intervening recall tests. Thus, the space 
separating a point on the solid line and a vertical point on any of the 
broken lines represents the effect of the recall test on retention for that 
particular situation. Figures 1 and 2, which are based on the entire 
population, show that immediate forgetting unaided by recall was very 
rapid and that in every case recall was beneficial to retention. 










































































ist Ty 
ie 
12 a + Rea ~{-=== =i 
28,°~~""=- powsecadensnessbesseccasccuceess 
ee ees eo re ee ee ~ 74; 
10 
RE ee 
© ¢ > I” By 
5 Jf ee a eranooo>-=--~------- as 
6 
Naided 
ee becal 
« PN An al ~-. Atled ky 
' Recal/ 
3 2 
yg 
° 
' 7 14 21 28 63 
, «TE IN oars , 
FG. I CURVES OF RETENTION FOR THE ENTIRE oe WHEN THE 
AMOUNT RETAINED IS EXPRESSED IN RAW SCORES 


When the papers were corrected for guessing through use of the 
formula, S = R — (N — 1)’ the critical ratios of differences remained 


practically the same as those shown in Table III. 

The one hundred sixty-nine children of Group [X who were given 
a second Test B immediately after taking the first test improved their 
mean score only .03. Thus, an immediate repetition of the test did not 
result in a large increase in the number of facts acquired. 

The mean score of the three hundred one children of Group X who 
were tested for previous knowledge was 5.27. The assumption that 
this score represents the previous knowledge of Groups I-VIII is 
rather questionable since the foils or wrong responses of items in the 
test were not as plausible for those who had not read the article 
(group tested for previous knowledge) as these responses were for those 





~e eer A 


er 


: Pe ae 





* 
) 








648 The Journal of Educational Psychology 


pupils who had read the article. The previous knowledge of Groups 
I-VIII, then, was probably less than 5.27. Since no other measure of 
previous knowledge was available, this amount was subtracted from 
the mean score of each group in obtaining the data from which Fig. 2 
was constructed. This figure, subject to the limitation given above, 
shows the curves of retention when the amount retained is given as a 
per cent of the amount learned. The figure shows the same trend ag 
that shown in Fig. 1. In this case, however, forgetting is more rapid 








ww men By 





= ad 


ey 


mafia oo" 
S am 























OF AMOUNT L 
g 











| 








AMOUNT RETAINED | 





























TIME IN OAYS 
ia CURVES OF RETENTION FOR THE ENTIRE POPULATION WHEN 


AMOUNT RETAINED IS EXPRESSED AS A’ PER CENT OF THE 
AMOUNT LEARNED 


and the curve of retention unaided by recall begins to level much 
nearer the zero line. 


LEARNING ABILITY AND RETENTION 


In order to determine whether ability to learn affected the shape of 
the retention curve, the Test B scores of those pupils who scored in the 
upper one third on Test A were compared with the Test B scores of 
those in the lower one third on Test A. The data for making these 
comparisons are shown in Tables IV and V. The data on the upper 
and lower thirds are shown graphically in Figs.3 and4. An examina- 
tion of these figures shows that the curve of retention of pupils of 
inferior learning ability begins to level or to have reached the horizontal 
seven days after learning. At the same time the curve for the pupils 
with superior learning ability does not begin to level until twenty-one 
days after the learning period. 


Studies in Retention 


649 


Taste I[V.—Tzgst B Resvuts or Purits WHo Maps Scorzs In THE UpPsR AND 
Lower Turrps on Test A 





























ree AM AM SD of 
Group Division N Pane Test test B Ppa ate 
I Upper 95 19.62 B, 17.02 3.57 
I Upper OE diab <-6 B, 16.76 3.36 
I Upper SNe wicking B; 15.47 3.87 
I Lower 95 10.42 B, 9.67 3.40 
I Lower Fe a B; 9.67 3.21 
I Lower OR evens B; 9.19 3.36 
II Upper 112 19.17 B, 17.18 2.70* 
II Upper ae © tens. B, 16.10 3.64f 
II Upper a esate « B,; 14.57 3.42f 
II Lower 112 10.92 B, 10.01 3.23 
II Lower ome. Tek B; 8.55 3.25 
II Lower Oe B, 7.39 2.49 
III Upper 123 19.33 B, 12.58 4.02 
III Lower 123 10.90 By 7.09 2.52 
IV Upper 113 19.27 B, 10.39 3.36 
IV Lower 113 10.92 B, 5.48 2.43 
V Upper 124 19.54 B, 9.28 3.47 
V Lower 124 10.84 B, 5.22 2.30 
VI Upper 130 19.58 B, 8.46 3.21 
VI Lower 130 10.76 Bi 6.33 2.03 
VII Upper 122 19.47 B, 8.50 3.11 
VII Lower 122 10.57 B, 5.57 2.13 
VIII Upper 117 19.43 B, 7.74 2.84 
Vill Lower 117 10.79 B, 5.16 1.95 
* Unusually small because high scores were deleted in the equalization 
procedure. 
t Not equalized. 


Tasts V.—Comparison or Test B Resvuuts or Urrzr AND Lowzr THIRDS FOR 
Groups DirreriIne By ONLY OnE RECALL 








SE Differ- SE 
Group Division| Test AM AM ence of | differ- | CR 
AM’s ence 
I Upper B: 16.76 .34 
III Upper B, 12.58 .36 4.18 .495 8.44 
I Lower B; 9.67 .33 
III Lower B, 7.09 .23 2.58 .402 6.41 
II Upper B, 16.10 .33 
IV Upper B, 10.39 31 5.71 .463 | 12.33 
II Lower B; 8.55 .30 
IV Lower B, 5.48 .23 3.07 .378 8.12 





























2G o< 
ae ow 








pee 9 - oE e 





650 The Journal of Educational Psychology 


Because of the fact that the data of Tables IV and V and Figs. 3 
and 4 are based on scores neither corrected for guessing nor corrected 
for previous knowledge, the rate of forgetting shown is probably much 
less rapid than the true rate of forgetting. According to the data as 
presented the pupils in the upper one third forgot twenty-six per cent 







3278, 
16 


"eee 
ee eee ecene wee eee ee woe 
coe 


On vest 86 


SCORES 


4 2 28 cS} 
TIME (8% OAYS 


FIG. 3 CURVES OF RETENTION FOR THE PUPILS IN THE UPPER ONE 
THIRD OF SCORES ON TEST A 





SCORES ON TEST 8 





7 4 
TIME IN DAYS 


FIG. 4 CURVES ge  e ares = | THE LOWER ONE 
of their original score in one day while the pupils in the lower one third 
forgot twenty-eight per cent. For one week, the figures were thirty- 
nine per cent and forty-four per cent. When scores were corrected 
for guessing the upper one third forgot within one day thirty-three 
per cent while the lower one third forgot forty-nine per cent. It 
should be remembered that the upper and lower thirds referred to are 


sci 


on 
tv 
tir 


- «th ae & = a af 


Studies in Retention 651 


based on the pupils’ scores on Test A. The data show that the pupils 
in the lower third tend to have a more rapid initial rate of forgetting. 

Additional data on the relationship between learning ability and 
retention were secured by correlating the immediate score of pupils on 
Test A with their delayed score on Test B. The score on Test A was 
considered a measure of the pupils’ ability to learn, while the delayed 
score on Test B was considered as a measure of the pupils’ ability to 
retain. The “‘r’s’’ obtained in the above manner ranged from .60 for 
one day to .44 for sixty-three days. Since the correlation between the 
two tests on immediate recall was only .76, the ‘‘r’s”’ given for the rela- 
tionship between immediate and delayed recall, or between ability to 


70 


PER CENT CORRECT 
8 


3 





° 





TIME IN OAYS 





FIG. S& CURVES OF FORGETTING FOR ITEMS NUMBER 9, II, 15, & 23. 
(PER CENTS UNCORRECTED) 

learn and retention, are probably very conservative. In connection 
with the relationship between immediate and delayed recall, the atten- 
tion of the reader is called to the fact that in some previous studies this 
relationship was found by correlating the immediate and delayed scores 
on the same test. The results of such a procedure are almost certain 
to be spuriously high because of the effect of recall on retention. When 
this procedure was used in the present investigation, a correlation of 
91 between the scores on immediate recall and recall after one day 
was obtained. 


EFFECT OF ITEM DIFFICULTY ON THE FORM OF RETENTION CURVES 


From an item analysis of the papers of all pupils who took Test B, 
the data on forgetting of individual items shown in Table VI were 

















652 The Journal of Educational Psychology 


obtained. The first seven columns represent the per cent of pupils 
tested at various times who answered each item correctly. The last 
column shows the per cent of the pupils tested for previous knowledge 
who answered the items correctly. An examination of the data in 
Table VI will show that there is little relationship between the rate of 
forgetting of items and the initial difficulty of the items. This fact is 
shown graphically in Fig. 5. The four items (9, 11, 15, and 23) for 
which curves of retention are shown were approximately of equal 
difficulty on the initial test. ‘Two of the items (11 and 23) were also 
about equal on the previous knowledge score. In spite of these simi- 
larities, widely differing curves of retention were found. The one 
general characteristic of the data on each item is the fact that the rate 


Taste VI.—Tue Per Cent or Correct Responses FoR Eacu OF THE ITEMS ON 








Tzst B 
Item jj tend) my | wil fv | dhwwts | hw | vm] x 
number II 

1 ao | 283 | 21 | 19 | 17 | 20 | 14] 10 
2 6. | 30 | 20 | 1 | 16] 16 | 15 | 44 
3 s1 | e2 | 52 | 45 | 43 | 43 | 50 | 60 
4 so | sé | 38 | 36 | 31 | 33 | 28 | 18 
5 5 | 42 | 40 | 37 | 39 | 41 | 461] 56 
6 7 | 57 | 57 | 55 | 59 | 61 | 61 | 66 
; 43 | 40 | 33 | 29 | 25 | 24 | 25 | a7 
8 32 | 24 | 18 | 19 | 22 | 18 | 19 9 
9 7% | 6 | 53 | 46 | 45 | 44 | 40 | 26 
10 5 | 23 | 2 | 20 | 14] 18 | a3 | on 
11 72 | 50 | 29 | 23 | 23 | 293 | 31 | 38 
12 299 | 22 | 19 | 16 | 15 | 14 | 10 6 
13 67 | 30 | 2 | 19 | 20 | 25 | 26 /| 19 
14 77 | 64 | 57 | 44] 46] 41 | 38 | 20 
15 73 | 46 | 34 | 2 | 21] 22 | 44 9 
16 41 | 23 | 27 | 17 | 13 | a7 | 44 ; 
17 ao | 23 | 22 | 18 | 19 | 19 | 143 8 
18 ce eel Bet mie Gee” 9 4 
19 es | 39 | 32 | 22 | a7 | a7 9 4 
20 53 | 39 | 32 | 34 | 338 | 27 | 32 | an 
21 63 | 27 | 16 | 14] 16 | 183 | 13 | 14 
22 oa | 17} 144] 19 | 16 | 13 | 147 9 
23 73 | 56 | 54] 52] sol] 51 | 50 | 34 
24 35 | 31 | 27 | 27 | 26 | 29 | 28 | 22 
25 31 | 21 | 21 | 19 | 17 | 2 | 20 | 10 





























SE of the per cents varies from 2.1 to 2.7. 


of 
sec 


stu 
CO! 
lov 





Studies in Retention 653 


of forgetting is more rapid during the first day than during any sub- 
sequent period. 

The data for items 3, 5, 6, and 11 (Table VI) show that reading or 
study of material can be detrimental to success on a test over the 
content of that material. Children who had read the article made 
lower scores on some tests over these four items than those who had not 
read the articles. In the case of these four items, the difference 
ascribed to this interference was statistically significant. 

The data on items 13, 24, and 25 seem to be evidence for reminis- 
cence. However, none of the differences or gains ascribed to reminis- 
cence are statistically significant. 

When the amount of previous knowledge (last column in Table VI) 
is subtracted from the per cents given in the other columns of Table VI, 
a very different picture is presented. (See Table VII.) The difficulty 


Taste VII.—ITems 1x Ranx Onper or Per Cent ANSWERED CORRECTLY AFTER 


SUBTRACTION OF THE AMOUNT CREDITED TO Previous KNOWLEDGE 











Item number|IandIIj III IV Vv VI VII VIII 
15 64 35 25 15 17 13 5 
19 64 35 28 18 13 13 5 

4 62 38 20 18 13 15 10 
14 57 44 37 24 26 21 13 
9 49 34 27 20 19 18 14 
13 48 11 5 0 1 6 7 
21 47 13 2 0 2 - 1 - 1 
2 47 16 6 1 2 2 1 
10 44 17 10 9 3 7 2 
23 39 22 20 18 16 17 16 
11 34 12 - 9 —15 —15 —15 - 7 
16 34 21 20 10 6 10 7 
18 33 16 15 13 10 10 5 
17 32 20 14 10 11 11 5 
20 32 18 11 13 12 6 11 
1 30 18 11 9 7 10 4 
7 26 23 16 12 8 7 8 
8 23 15 9 10 13 9 10 
12 23 16 13 10 9 8 4 
3 21 2 —- 8 —15 -17 —17 —10 
25 21 11 11 9 7 11 10 
22 13 8 5 10 7 4 8 
24 13 9 5 5 4 7 6 
6 10 =- § - 9 —11 - 7 =- § -— §& 
5° =- | —14 —16 —19 —17 —15 —10 






































654 


rank of some items changes considerably. 


The Journal of Educational Psychology 


However, even after this 


change, no relationship between difficulty and rate of forgetting is 


evident. 


Taste VIII.—Tue Errect or REecaut ON RETENTION FOR INDIVIDUAL ITEMs 








Group | Group ete 
II per | II per P | Differ- | Percent| Differ- | Per cent 
Vill 
Item number}! cent cent ence for- ence for- 
correct | correct | P&* rire B, — B;| gotten |B:—VIII| gotten 
correct 
on B; | on B; 
1 39 25 14 14 36 25 64 
2 61 40 15 21 34 46 75 
3 81 63 50 18 22 31 38 
4 80 67 28 13 16 52 65 
5 55 61 46 — 6 —11 9 16 
6 76 81 61 -— 5 -— 7 15 20 
7 42 33 25 9 21 17 40 
8 31 17 19 14 45 12 39 
9 74 65 40 i) 12 34 46 
10 51 37 13 14 27 38 75 
11 72 41 31 31 43 41 57 
12 28 21 10 7 25 18 64 
13 63 43 26 20 32 37 59 
14 76 65 33 11 14 43 57 
15 72 55 14 17 24 58 81 
16 41 33 14 8 20 27 66 
17 40 35 13 5 13 27 68 
18 37 37 9 0 0 28 76 
19 68 52 9 16 24 58 85 
20 54 36 32 18 33 22 41 
21 59 37 13 22 37 46 78 
22 26 22 17 2 8 9 35 
23 69 68 50 1 1 19 28 
24 35 37 28 — 2 — 6 7 20 
25 31 26 20 5 16 11 35 





























In order to determine what effect recall had on the retention of 
individual items, a comparison of the amount forgotten by Group II 
(with recall) and Group VIII (without recall) was made. In each case 
the time elapsed was sixty-three days. The data for making this 
comparison are given in Table VIII. The data show that recall was 
beneficial to retention for every item. When the pupils were divided 
into an upper and a lower group on the basis of their Test A scores and 


aanRpenrtmnrt ewe os 2 


m’ B 


— i — i) 





Studies in Retention | 655 


data similar to that of Table VIII prepared, it was found that the 
superior pupils benefited more from recall. 

The high scores made by the pupils of Groups I and II on the 
repeated tests might be attributed to the pupils’ remembering how the 
test was marked the first time. If this assumption were valid, then, 
both correct and incorrect responses would be repeated with equal 
consistency. In order to test the assumption, one hundred of the 
Test B, and Test B; papers of Group II were analyzed to determine the 
consistency of responses. It was found that right answers were 
repeated seventy-nine per cent of the time; while wrong answers were 
repeated only fifty per cent. Since right answers were repeated more 
consistently than wrong answers, only a small portion of the unusually 
high scores on the second trial of the test can be ascribed to the pupils’ 
remembering how the test was marked the first time. Chance alone 
would give twenty per cent repetition of responses. The fact that 
wrong answers were repeated more often than chance would allow is 
not in conflict with the assumption that recall and not mere repetition 
is the factor that aids retention. In giving some wrong responses, the 
pupils were sometimes recalling information given in the article, for 
many of the false responses were taken from the content of the article. 
Thus, recall can also aid the retention of erroneous ideas. 

Limitations.—The findings of this study are subject to a number 
of limitations. The most important of these are the following: (1) 
The method of measurement employed a type of cue not used in 
recalling information in everyday school or life situations. (2) The 
tests were repeated. (3) The learning was of little practical use to 
the children. (4) The children were given no opportunity to refer to 
the material after the initial learning period. (5) The pupils did not 
know whether their responses were correct or incorrect. (6) The 


effect of reacting to one item upon the response to other items is not 
known. 


EDUCATIONAL IMPLICATIONS 


The findings of this study seem to warrant the following educa- 
tional implications: (1) Immediate recall in the form of a test is an 
effective method of aiding the retention of learning and should, there- 
fore, be employed more frequently in the elementary school. (2) 
Because recall can aid in fixing erroneous ideas, all tests or examina- 
tions should be returned to the pupil corrected, or the pupil should be 
given an opportunity to correct his own paper. (3) Achievement 


ae 











656 The Journal of Educational Psychology 


tests or examinations are learning devices and should not be considered 
only as tools for measuring achievement of pupils. (4) Since reading 
about a topic can interfere with the knowledge the subject already 
possesses, careful consideration should be given to what is supposed 
to be learned and to the thoroughness of learning. (5) In all studies 
where the same tests are repeated, the possible effects of recall on 
retention should be recognized. 


xs 2 2 » @& BW 


BIBLIOGRAPHY 


. Dietze, A. G., and Jones, G. E.: ‘Factual Memory of Secondary-School Pupils 


for a Short Article Which They Read a Single Time.” J. Ed. Psycho., Vol. 
xxi, 1931, pp. 586-598, and 667-676. 

Gates, A. I.: “Correlation of Immediate and Delayed Recall.” J. Ed. 
Psycho., Vol. rx, 1918, pp. 489-496. 

Jones, H. E.: “Experimental Studies of College Teaching.” Archives of 
Psychology, No. 68, 1923. 

Keys, Noel: ‘‘The Influence on Learning of Weekly as Opposed to Monthly 
Tests.” J. Exp. Psycho., Vol. xxv, 1934, pp. 427-436. 

Myers, G. C.: “Recall in Relation to Retention.” J. Ed. Psycho., Vol. v, 1914, 
pp. 119-130. 

Raffel, Gertrude: ‘‘The Effect of Recall on Recognition.” J. Exp. Psycho., 
Vol. xvm, 1931, pp. 828-837. 

Yoakam, G. A.: The Effect of a Single Reading on the Retention of Various Types 
of Materials in the Content Subjects of the Elementary-School Curriculum as 
Measured by Immediate and Delayed Recall. Ph. D. Thesis, State University 
of Iowa, 1922, p. 254. 

Young, William E.: The Relation of Reading Comprehension and Retention to 
Hearing Comprehension and Retention. Ph. D. Thesis, State University of 
Iowa, 1930. 





A’ 


a 


or 





ATTITUDES OF HIGH-SCHOOL FRESHMEN TOWARD 
OCCUPATIONS OF THEIR CHOICE BEFORE AND 
AFTER STUDYING THE OCCUPATIONS BY 
MEANS OF A CAREER BOOK 


RICHARD M. BATEMAN 


Peru, Indiana, High School 
AND 
H. H. REMMERS 


Purdue University 


I. THE PROBLEM 


That one’s attitude toward one’s life work is one of the most impor- 
tant aspects of adjustment in the work-a-day world is attested both 
by popular belief and scientific investigation.! Any vocational 
guidance in our:schools should, therefore, take account of existing 
attitudes of youth toward occupations one or another of which they 
will shortly have to enter. 

Guidance, including vocational guidance, has been more impressive 
with regard to good intentions and faith in its activities than it has with 
reference to objective analysis and evaluation of the outcomes of its 
techniques and procedures. High hopes and deep convictions when 
not a result of such critical evaluation are too frequently distortions of 
reality resulting from wishful thinking to be depended upon to square 
with objective reality. Validation of guidance activities in terms of 
measured results is in the long run not only desirable but essential to a 
sound guidance program. 

While there is in the literature a general dearth of studies designed 
to validate guidance procedures and techniques, this lack is particu- 
larly noticeable with reference to attitudes toward the various voca- 
tions that high-school students may choose. The study here reported 
had, therefore, for its general purpose the evaluation of attitude out- 
comes toward chosen vocations as affected by a defined guidance 
activity—the making of a career book by high-school pupils. 


II. PROCEDURE 


The study was made in four freshmen social science classes totaling 
one hundred seven pupils—thirty-nine boys and sixty-eight girls— 





1 Hoppock, R.: Job Satisfaction. Harpers, 1935. 
657 








_—_ 




















658 The Journal of Educational Psychology 


under the direction of Hubert Middlekauf at the Peru High School, 
Peru, Indiana. The students in these four experimental classes were 
enrolled in commercial, industrial, or home economics courses. In 
carrying out this experiment the authors used the Miller-Remmers 
Scale to Measure Attitude Toward Any Occupation. (Forms A 
and B.') 

The students in the four experimental classes were asked to select 
an occupation of primary interest to them as a future vocation. The 
students were then given an initial attitude test toward their chosen 
vocational preference. Each class was divided into two equivalent 
groups. One half of each class responded to Form A of the attitude 
scales while the other half responded to Form B, and then the procedure 
was reversed. 

After the four classes were measured as to their initial attitude 
toward their chosen occupation, an occupations bibliography and the 
following career book outline was given to each student: 


OUTLINE FOR CAREER BOOKS 


Occupational Guidance—Peru High School 


(1) Cover—Make your own design. See books and magazines for ideas. 
Your cover should be very attractive and should suggest the occupation you 
are making the booklet on. 

(2) Title Page—See books for examples. Be careful of arrangement, pen- 
manship, and spelling. 

(3) Dedication—Your book should be dedicated to someone who has 
helped you to choose your career and who you feel has been of much encourage- 
ment to you. Tell why you dedicate the book to the person you do. 

(4) Preface—Read prefaces to several books for examples. It should tell 
the purpose of your book—what you tried to do and how you did it. 


(5) Table of Contents—Use chapter headings and page numbers. See any. 


books. 

(6) List of Illustrations and Drawings—See any book. Name and page 
number. | 

(7) Personal Analysis Chart—Will be provided by instructor. 





1 Miller, H. E.: ‘The Construction and Evaluation of a Scale of Attitudes 
toward Occupations.” Studies in Higher Education XXVI, Bulletin of Purdue 
University, Vol. xxxv, No. 4, December, 1934, pp. 68-76. 

Sigerfoos, Charles C.: ‘The Validation and Application of a Scale of Attitude 
toward Any Vocation.” Studies in Higher Education XXXI, Bulletin of Purdue 
University, Vol. xxxvu, No. 4, December, 1936, pp. 177-191. 


tio 


ss 'S 


(3 


Attitudes of Freshmen toward Occupations 659 


CHAPTER HEADINGS 


Chapter I. Importance of Occupation. 
1. Future. 
2. Comparison with other occupations in the city. 
3. Opportunities for employment in this city. 
4. Steadiness of the work. Use as many pages as needed. 
Chapter II. Advantages. 
(a) Working conditions. 

(1) Pleasant and healthful. (2) Steady. (3) Regularhours. (4) Vaca- 

tions with pay. (5) Rest and recreation periods. (6) Safety. 
(6) Nature of work. 

(1) Sufficient variety to make it interesting. (2) Opportunities to be 

original. (3) Working with people. (4) Working with things. 
(c) Desirable effects on personality of worker. 

(1) Freedom from mental or physical strain. (2) Change of scene, 
opportunity to move about. (3) Opportunity for service to others. (4) 
Recognized position in community. (5) Satisfaction in work. 

(d) Training. 

(1) Short period of preparation. (2) Cost of training. (3) No unpro- 
ductive period while building up a business. (4) No necessity for further 
study after securing employment. 

Chapter III. Study of Occupation. 

(a) Work done. 

(1) Definition. (2) Kinds of workers. (3) Duties of workers. (4) 
Typical day’s work. 

Chapter IV. Disadvantages. 

(a) Working conditions. 

(1) Seasonal work. (2) Irregular hours. (3) No regular vacation with 
pay. (4) No rest. (5) Danger of accident and disease. (6) No oppor- 
tunity for training. 

(6) Nature of work. 

(1) Monotonous, routine work. (2) Necessity to do original work. 

(3) Not recognized by community. (4) No chance to meet people. 
(c) Undesirable effects on personality. 

(1) Physical and mental strain. (2) No change of scene. (3) No 
opportunity for service. (4) Not recognized by community. (5) No chance 
to meet people. 

(d) Training. 

(1) Length of training period. (2) Expense. (3) Necessity for improve- 
ment and study. 

Chapter V. Education. 


1. Necessary. 





——_—-_ 











Pipes 


+ eg 


teat ite 


ae r. - He > ‘ ra yo 
fof ee 


4 
4 


660 The Journal of Educational Psychology 


2. Desirable. 
3. Schools giving training for this occupation. 

(a) Elementary. (6) Trade school. (c) Junior high school. (d) Col- 
lege. (e) Professional. 

4. Experience (Possibilities of getting part time work in this job). 

Chapter VI. Personal Qualifications. 

Summarize your chart and tell your qualifications compared with those 
most needed for your chosen career. Be very fair with yourself and do not 
hesitate to say exactly what you think. 

Chapter VII.. Compensation. 

(a) Salaries. 

(1) Minimum. (2) Maximum. (3) Average. (4) Increases—When and 
how often. Are the wages sufficient to maintain a reasonable standard of 
living? (5) Professional possibilities. (6) Possible lines of advancement. 

Chapter VIII. Read and write a report on the biography of some one 
successful in your chosen occupation. 

Chapter IX. Write up an interview that you have had with someone 
successfully engaged in your chosen occupation. 

Note.—A bibliography is to be kept of each book or article consulted—give 
name of author in full, complete title, and remarks on content and usefulness 
of the material. 


GENERAL DIRECTIONS FOR CAREER BOOK 


(1) Write in ink or use typewriter. 

(2) Use a separate sheet of paper for each new topic. 

(3) Search through newspapers, books, magazines for pictures and articles. 
Be sure that every picture fits the topic—that it is neatly trimmed and 
labelled. 

(4) If you draw reasonably well, your book would be more valuable and 
interesting if illustrated by sketches which you may draw. 

(5) You may head each chapter with a quotation or a poem if you can 
find an appropriate one. 

(6) As a frontispiece you may have a poem, copied or original which 
should have the central thought of the book. 

(7) Each new topic with Roman numeral should head a new chapter on a 
new sheet of paper. 


The career book outlines were divided into units and each week one 
period of fifty minutes was devoted in each class to supervised work 
and discussion on the career booklets. Approximately sixteen weeks 
were devoted to the project. At the completion of the career study the 
students were again asked to respond to the attitude scales following 
the procedure used in the pre-tests. 


_— 
dl ll lt od | 
a tet tel coe a 


|3 8 


































Attitudes of Freshmen toward Occupations 


III. OCCUPATIONAL FIELDS INVESTIGATED 


A varied assortment of occupations was investigated by the 
students. A number of students made career books for occupations 
that they were not specifically preparing for in their particular high- 
school course of study. The girls in the four experimental classes were 
taking either commercial or home economics courses while the boys 
were taking industrial or commercial courses. Since none of these 
students were in a college preparatory course there was slight possibil- 
ity that those students who made career books for the engineering, 
medical, or legal professions would ever enter these occupations. 

The students’ chosen vocations were classified in the following six 
occupational areas: 

















Occupational area Boys Girls Total 
NS 65 is BBE kisS 0 ee eidh 8S ced RELA WERE 7 31 38 
i a a 9 5 14 
in en eae ey dle ow dae 9 31 40 
4. Domestic and personal........................ 3 2 5 
SI oon cc's oteru'eie ale Kinliewuu oe Ohies wom 2 0 2 
ee  SAUTESESTLETETECEIR CE CELE E TERETE q 0 9 





IV. CORRELATIONS OF ATTITUDE SCALES-FORM A vs. FORM B 


The smaller r’s found in the initial test can be accounted for by the 
small spread of the measures. The increase in the 7’s on the post-tests 


TaBLe I.—REwiABILITY OF THE ATTITUDE MEASUREMENTS Form A os. Form B 





Girls Boys Total 





N r N r N r 





RING bos 04 s00s 00> cactewateen 68 19; 39 .385 | 107; .26 
BIE bi gbn osc cGpaes > bee ntaen 68 .69 | 39 65 | 107 | .67 























can be accounted for by the noticeable significant increase in the spread 
of the measures for the girls and total and the increased spread for 
the boys. 


V. CORRELATIONS FOR PRE-TEST V8. POST-TEST ATTITUDE RESULTS 


The results in Table II show the correlations for the Pre- vs. Post- 
test attitude results. As is apparent from the corrected correlations 


a 
ee aortas 


= eee 


= 

















662 The Journal of Educational Psychology 
Taste I].—CorrELaTIONs oF INITIAL AND Fina Summep Scores For 
OccuPaTIONS 
Girls Boys Total 
r SE, r SE, r SE, 
Pre-test vs. post-test............... 81] .11] .20] .15] .33] .09 
Pre-test vs. post-test!............... .70 .06 .81 .05 .80 .03 























1 Corrected for attenuation. 


there was little shift in the relative position of the individuals in the 
distribution between the pre-test and post-test. 


VI. RESULTS OF EXPERIMENTAL DATA FOR OCCUPATIONS 


The results in this experiment and the comparisons are for the 
summed scores (Form A plus Form B) for the attitude scales used in 
this experiment. In all comparisons the critical ratios for the boys, 
girls, and total were computed from the formula for correlated 
measures. 


TaB_zE III.—Prer- anp Post-rust AVERAGES OF SuMMED ScorzEs FoR OccuPATIONS 












































Sex N AM | SD | SE, | AM | SD | SE, 

Ni i as aii i a 68 | 17.98) .65 .08 | 16.48) 1.05 13 
Boys... 39: | 17.61); .92 .15 | 16.38) 1.03 17 
657. one CE. TO RR ES 107 | 17.84) .78 .08 | 16.45) 1.04 .10 
Taste IV.—CompParRIsoNs OF AVERAGES AND SD’s ror OccupaTIons. SumMMED 

ScorgEs 

Chances SD, — SD: | Chances 

in one SE in one 
M SE 1 — M: | hundred SD hundred 

Sex |N|! | differ- of a D. SE sD of a 
Ms ence | difference | “true”’ SD: ff : “true” 

differ- differ-| differ- differ- 

ence Cee | eaee ence 

Girls. ..| 68) 1.50 .13 11.45! 100 .40 .10 | 4.00! 100 

Boys...| 39) 1.23 .19 6.83! 100 11 .15 .73 77 

Total...|107| 1.39 .10 |} 138.90! 100 26 .08 | 3.251 100 


























1 Indicates a statistically significant difference. 














Attitudes of Freshmen toward Occupations 663 


From Table III it is evident that the experimental group had a 
substantially favorable mean attitude toward occupations of their 
choice. It should be explained that the scale values for the attitude 
tests used run from 10.4 at the favorable end of the continuum to 
0.6 at the unfavorable end. A value of 6.00 means a neutral attitude. 


Attitude of thigh Schoo! Students Toward 
Various Occupations Berore and After’ 
Making Careér Books for Occyparors. 


R4) 








Before 
“s PY So 
a 68 Girl 
2 
S . 
WB x0 
= 
~~ 
S 20 
S 
=> 1§ 
10 
§ 
Oo Coot 











“4 le “3 /4 45 46 ‘7 18 19 <0 


Frequency Distribution of Summed Scares 
Figure 1 


Since in this experiment the results are given in terms of summed scores 
for the attitude scales (Form A plus Form B), the scale values will run 
from 20.8 to 1.2 with 12.00 representing a neutral attitude. The mean 
attitude before studying the occupation was 17.84 for the total, which 
on the scales (Form A and Form B) is represented by the statements 
“This occupation will mean a great deal to me when I am old.” 





~ he eee 











0 — oe ‘ 
Vi aa ee ie 
Sie ee eS 


at ee a Nae Sits SNR Sia ar 
rae te 
iw, 


SRE nice eth. “ - saben * 
The Ritadtigs 2 Sa alee en gers ene oe roe 
PS ae Pp or ns PO 






vai daz tte : 
So Peta oie. Kase 
an e * 


o At 
rh gt 


at 


Ma 

iT 

. t t 
bi 

x 

u 

a“ y 
<a 
ry 

# 
{ 

. 
id 

# 

i 

3 

; 


e 


664 The Journal of Educational Psychology 


(Form A—Scale value 8.9) and ‘This work will bring one greater 
respect from both oneself and others than most other jobs.” (Form 
B—Scale value 8.9). Following the study of their chosen careers there 
was a statistically significant decrease in the total students’ mean value 
to 16.45. This value corresponds on the scales to “This job has several 


Attitude of figh Schoo! Students Toward 
Various Occupations “Berare and After” 
Making Careér Books tor Occypoaiions 

















4 before 
35. After ------ = 
Q J9 Boys 
s* 
:* 
3S ” 
S« 
§ 
s 
° 
“ 
Frequency Distribution of Summed Scores 


Figure 2 


very decided advantages over most other jobs.” (Form A—Scale 
value 8.3) and “This vocation is interesting.” (Form B—Scale 
value 8.3.) 

A further very significant fact is the increase in the spread of the 
final test as measured by the standard deviation. This same tendency 
was present in a previous experiment by Remmers and Whisler’ in 





1 Remmers, H. H. and Whisler, L. D.: ‘‘The Effects of a Guidance Program on 
Vocational Attitudes.” Studies in Higher Education XXXIV, Bulletin of Purdue 
University. Series III. September, 1938, p. 75. 


eS oo 2 














Attitudes of Freshmen toward Occupations 665 


which they made the following statement concerning the effects of | 
guidance on vocational preference: ‘‘This suggests that stereotypes, 
fictitious patterns of ideas, about vocations were to some extent broken 


Attitude of High Schoo Students Toward | 
Various Occupations “Before and Affer | 
Making Coreér Books fer Occupations | 


Boys pts Grk-sor | 





"Sa Se oe 














Number of Studesirs 
—_ s 


3 





<x 








Frequency Distribution of Summed Scores 
figure 3 
down by the guidance programs and replaced by more individualistic 
attitudes. A stereotype always involves a distortion and generally an 


oversimplification of the thing to which it refers. Thus the stereotype 
of aviator may be a man who works with engines, takes risks, and gains 





666 The Journal of Educational Psychology 


fame. The number of boys with favorable attitudes toward this 
picture is necessarily different and tends to be larger than the number 
with favorable attitudes toward the more detailed picture of the 
aviator.” 

Tables III and IV show the results when analyzed for boys alone, 
girls alone, and totals. The most significant shift in the spread of the 
measures is for the girls, critical ratio 4.00; only modcrate shifts in 
spread were noted for the boys as indicated by the critical ratio of .73. 
The total group showed a significant shift in their spread as indicated 
by the critical ratio of 3.25. 

Statistically reliable decreases were found in the mean attitude for 
girls, boys, and total. Evidently learning about a chosen vocation 
tends to remove the halo from the vocation and tends to lead toward 
more realistic-attitudes. The total group had an attitude shift of 1.39, 
and it was significant as indicated by the critical ratio of 13.90. 

A study of the frequency distributions for the pre- and post-attitude 
tests as found in Figs. 1, 2, and 3 shows the direction of the shift in 
attitude. 


VII. CONCLUSIONS 


In summarizing the results of this study it should be noted that the 
purpose and method of the experiment are at least as significant as are 
the specific quantitative results. The quantitative data warrant the 
following conclusions: 

(1) Career books do produce significant changes in attitude in 
students toward occupations. 

(2) The students were on the average less favorable toward the 
occupations of their choice after making career books. 

(3) The girls’ initial and final mean attitude scores were more 
favorable than those of the boys. 

(4) The girls’ mean attitude difference between their initial and 
final test was greater than that for boys. 

(5) The girls tended toward greater variability in the final test 
than did the boys. 

(6) The same general tendencies were noted in the spread of the 
measure for girls, boys, and total—greater variability in the final 
attitude scores. | 

(7) The increase in spread may be interpreted as giving evidence of 
the breakdown of a group stereotype and the development of indi- 
vidualistic attitudes. 


SEBEBaAEB 


mw 
co 


ey BP 


irmeadaeraetaner © eS SS 


F 







































CHANGES IN READING PERFORMANCE DURING 
THE FRESHMAN YEAR OF COLLEGE 


ROBERT M. BEAR AND HENRY A. IMUS 
Dartmouth College 


It is generally assumed that the curricular assignments in the 
first year of college require considerable usage of the basic skills 
in reading. As a result of the exercise of these skills and because 
of the operation of more general factors, such as maturation, changes 
may be expected during this period. Additional information regard- 
ing the extent of the changes and their nature is desirable. With this 
end in view and as part of a more comprehensive program of research,! 
study was made of the freshman class entering Dartmouth College in 
September, 1936, to determine (1) what changes occur in reading 
performance during the academic year as measured by a reading test, 
and (2) possible conditions and factors having a bearing on the changes. 


PROCEDURE 


Among the tests given to all freshmen during the opening week of 
the college year were the Iowa Silent Reading Test, Form B, and the 
American Council Psychological Examination. Approximately eight 
months later, on May 7, the same students were given Form A of the 
Iowa Test, and the changes in scores from the previous September in 
comprehension and rate of reading were computed. On each occasion 
the paragraphs of the rate test were rescored and the number of words 
read per minute ascertained, thus making available another index of 
change in rate of reading. 

Work conducted by a member of the psychology department for 
improving reading reaches over two hundred freshmen each year. 
Three-fourths of these students request to be enrolled, many being 
average and some even superior readers. The remaining one fourth 
are well below average either in rate or comprehension or both at the 
time of the opening of college. For the purposes of this study it was, 
therefore, necessary to exclude all such men who had had any contacts, 
however slight, with the reading instruction. When the separation 
was made there remained three hundred eighty-three freshmen, a 
group with slightly higher comprehension scores than was typical of 





1Tmus, H. A., Rothney, J. W. M., and Bear, R. M.: An Evaluation of Visual 
Factors in Reading. Dartmouth College Publications, 1938. 
667 


Palatine. ws 


ne aT 





- ie 


ay 





668 The Journal of Educational Psychology 


the distribution for the class as a whole but undistinguished in rate of 
reading. The September mean scores in comprehension and rate for 
the entire class were 156.6 and 37.7, respectively, while the correspond- 
ing mean scores were 166.4 and 36.8 for the three hundred eighty-three 
students. The changes in reading for this latter group are the subject 
of this report. 


FINDINGS 


Changes in Reading Performance during the Academic Year.—The 
Iowa Test scores for September and May are found in Table I. While 


Tasue I.—Comparison oF SepremMBER AND May Iowa Sizent Reapine Tzsr 
Scorzs ror Tuoree Hunprep E1cuHTy-THrRee DartTmouts CoLLeGcE 











FRESHMEN 
Mean scores ee 
Iowa test Differences Critical 
September | May nantes 
Comprehension................... 166.4 180.1 13.7 5.7 
ET ee eae c's sb bas cae eR 36.8 36.6 -—0.2 0.4 
Words read per minute............ 255.9 265 .5 9.7 2.3 

















there was a significant mean gain of 13.7 points in comprehension, the 
rate score remained unchanged and there was an increase of only 
nine and seven-tenths words read per minute. Neither of the differ- 
ences in these measures of speed of reading are statistically reliable, 
which allows us to presume that chance may have been one of the 
factors in whatever change did occur. Examination of individual 
records showed that with respect to comprehension eighty-three and 
eight-tenths per cent of the freshmen gained in score, one and eight- 
tenths per cent did not gain, and fourteen and four-tenths per cent lost. 
In number of words read per minute only sixty and six-tenths per cent 
gained, eight-tenths per cent remained unchanged while thirty-eight 
and six-tenths per cent lost. | 

Possible Factors Bearing on Changes.—Having found that a small 
but significant gain in comprehension was obtained by the group with 
little change in speed of reading, investigation was made of possible 
factors which might have a bearing upon these facts. In view of the 
moderate positive correlation usually obtained between reading test 
and scholastic aptitude test scores, it seemed worth while to study this 


rels 
on 
wel 








Changes in Reading Performance 669 


relationship more in detail. The percentile ranks of the freshmen 
on the Psychological Examination which was given in September 
were consulted and the men divided into the five groupings shown 
in Table II. The mean gains and losses between the September and 


Tasie IJ.—Initr1a Megan Scorzs anp GAINS DURING YEAR ON Iowa 
COMPREHENSION AND Worps Reap per MINvTE ror FRESHMEN GROUPED 
BY PERCENTILE RANK ON THE PsYCHOLOGICAL EXAMINATION 











Comprehension Words read per minute 
Percentile oie oe 
rank secetaatl September M . September M ‘ 
mean a mean sa ase 
80-99 + 107 179.4 12.4 277.5 2.1 
60-79 101 168.9 12.5 254.0 11.0 
40-59 78 161.8 13.2 246.2 11.0 
20-39 59 154.2 16.2 245.2 8.8 
0-19 38 144.6 13.2 232.1 17.3 




















May tests on the lowa were determined for each group. It will be 
observed that at the various levels of scholastic aptitude, the mean 
gains in comprehension are relatively much the same, being only 
slightly higher for those in the lowest two-fifths of the class than for 
those in the highest two-fifths. The similarity in magnitude of these 
gains suggests the influence of some common factor, such as a possible 
vocabulary growth, resulting from courses taken by all students. To 
check the latter, the mean gain in vocabulary as measured by the 
Iowa was computed for the entire group of three hundred eighty-three 
students and found to be 2.6 points, a gain equivalent to nineteen and 
one-tenth per cent of their total absolute gain in comprehension score 
of 13.7 points. Since in the total comprehension score possible on the 
Iowa the vocabulary test contributes twenty-nine and nine-tenths 
per cent, the vocabulary growth of these students does not appear to 
have had disproportionate influence. 

The mean gains in words read per minute were again rather uni- 
form for the middle groups, but those at the lowest levels of scholastic 
aptitude far outgained those at the highest. 

Another condition which might be expected to have some influence 
upon subsequent reading development was initial proficiency in 
comprehension skills. Do men high in comprehension greatly improve 
reading speed or slow down after entering college? What happens 





































+. oh ee TO - 


~ 


Pee es Rotel 


vets. 





*? 
: 
; 
af 
i 
wht 
¢ 
of 
i 
nee 
i 
) 


670 The Journal of Educational Psychology 


to those initially low in comprehension? Is there any evidence of 
compensation? ‘The freshmen were divided into five groups according 
to the Dartmouth percentile ranks of their comprehension scores on 
the Iowa Test in September and the changes from the May Test 
tabulated. The results appear in Table III. While a consistent 


Tasie II].—Mean Gains purinc YEAR ON Iowa COMPREHENSION AND Worps 
ReaD PER MINUTE FOR FRESHMEN GROUPED BY DARTMOUTH PERCENTILE 
Rank OF COMPREHENSION ScorE ON SEPTEMBER IOWA 




















Mean gains 
Percentile rank Number Words read 
Comprehension oe oe 

minute 
80-99+ . 115 6.5 6.6 
60-79 102 13.5 14.4 
40-59 79 14.6 3.2 
20-39 66 20.2 9.3 
1-19 21 23.3 14.3 





tendency was discovered for the mean gains in comprehension to be 
larger at successively lower levels of initial comprehension, there was 
little evidence of any trend except toward smallness of gain in terms 
of words read per minute. 

Is initial rate of reading a factor in changes during the Seechenen 
year? Investigation of trends was made in the same manner as above. 
The students were separated into five groups on the basis of their Dart- 
mouth percentile ranks for the score made in rate on the Iowa Test in 


Taste IV.—Mzan GAINS DURING YEAR ON Iowa COMPREHENSION AND WorpDs 
Reap PER MINUTE FOR FRESHMEN GROUPED BY DartTMOUTH PERCENTILE 
RankK OF Rate Scorn Maps on SepremsBer Iowa 











Mean gains 
Percentile rank Number Words 
Comprehension pea eh gtd 

minute 
80-99 + 109 12.3 -9.6 
60-79 110 14.0 8.7 
40-59 99 14.4 21.9 
20-39 53 13.0 17.0 

1-19 12 11.3 38.9 














rai 


thi 





Changes in Reading Performance 671 


September. The mean gains of these groups as seen in Table IV 
rather reversed the trends of the preceding classification. Gains in 
comprehension were quite small from one level of initial rate to another, 
while mean gains in words read per minute increased consistently 
from the higher to lower levels. From this it would appear that 
students with very slow rates at the beginning of the freshman year 
might be expected to show an improved speed if tested by the Iowa 
Test eight months later. Even if some unreliability is suspected in the 
techniques of measuring or scoring the speed of reading, the trend of 
the changes gives evidence of genuine gains for the students who were 
initially quite deficient in rate. 

The differences in changes at corresponding percentile levels for 
comprehension and speed as compared in Tables III and IV suggest 
that different individuals were classified in these levels on the two 
bases. What would be the result if a comparison were made which 
separated those low both in comprehension and rate from those 
average in both or from those superior in-both? To investigate this 
possibility, the Dartmouth percentile rank in both comprehension and 
rate of each man was noted to determine in which quarter of the class 
it fell. For comprehension, Quarter I contained those scoring between 
1 and 139; Quarter II those scoring between 140 and 158; Quarter III 
those scoring between 159 and 173; and Quarter IV those scoring 174 
or above. For rate, Quarter I contained those scoring between 1 
and 27; Quarter II those scoring between 28 and 32; Quarter III those 
scoring between 33 and 37; and Quarter IV those scoring 38 or better. 
Each student’s rank in both comprehension and rate was then deter- 


Taste V.—Mgan GAINS DURING YEAR ON IOwA COMPREHENSION AND WORDS 
ReaD PER MINUTE FOR FRESHMEN GROUPED BY PERCENTILE RANK ON 
Born CoMPREHENSION AND Rats Scores Maps on SEePpTreMBeR Iowa 











Mean gains 
Group Num- 

ber |Compre-| Words read 

hension | per minute 
Low in both comprehension and rate........... 31 19.2 19.8 
Low in comprehension and high in rate......... 30 23.3 —13.3 
Average in both comprehension and rate........ 150 14.9 15.1 
High in comprehension and low in rate......... 17 6.5 33.5 
High in both comprehension and rate........... 144 10.1 2.1 




















672 The Journal of Educational Psychology 


mined and all those having similar combinations of rank in rate and 
comprehension were placed together. Thirteen different combinations 
were found, but it was observed that they fell into the five categories 
reported in Table V. Reading changes for the students of each 
category are given. A balancing up of extreme deficiency by students 
was suggested by the data. The largest gains in comprehension 
were made by those originally low in comprehension and high in rate; 
the largest gains in speed by those originally low in rate but high in 
comprehension. The small numbers involved, however, leave in doubt 
anything more quantitative than a suggestion of the direction of the 
trend of change. Those high in September in both rate and com- 
prehension had little increase in speed and were slightly under 
the mean gain of 13.7 points for comprehension for the entire group 
of three hundred eighty-three freshmen. Their failure to increase 
speed and the loss in words read per minute on the part of those 
initially low in comprehension and high in rate appear to be factors 
in the slight gains and lack.of consistent trend in change observed in 
Tables I, II and III. The fact that the one hundred fifty freshmen 
originally average in both comprehension and rate made mean gains 
of 14.9 points in the former and of fifteen and one-tenth words read 
per minute in the latter should be noted. Incidentally, a similar 
analysis of gains for the students participating in the reading instruc- 
tion showed those in groups corresponding to the first three of Table V 
had slightly greater gains in comprehension and far greater gains in 
speed than those reported for these groups above. 


Be, © 


= 


SUMMARY AND CONCLUSION 


(1) All freshmen of the Cl f 1940 having had no contact with 
the College’s reading improvement program were studied to determine 
changes in reading performance on the Iowa Silent Reading Test. 

(2) This group of three hundred eighty-three students were found 
to have a mean gain of 13.7 points in comprehension score, a mean loss 
of 0.2 points in rate score and a mean gain of nine and seven-tenths 
words read per minute on the paragraphs of the rate test. Only the 
comprehension gain proved statistically reliable. 

(3) Comparison was made of the gains in comprehension and speed 
of the students when grouped according to their September percentile 
ranks at Dartmouth on the Psychological Examination and on the 
Iowa comprehension and rate, respectively. Two findings were noted: 


we we Ve 


Changes in Reading Performance 673 


First, that rather consistent mean gains of between twelve and 
fifteen points in comprehension were registered. It seems probable 
that these may be interpreted in part as the results of maturation 
under the stimulus of the college environment and the faster rate of 
reading developed by those very slow at the time of the September 
test.1 Comprehension gains larger or smaller than the above were 
chiefly confined to those who in September were, respectively, most 
deficient or most superior in comprehension as measured by the Iowa, 
rather than being dependent upon level of a student’s scholastic 
aptitude. 

Second, that no general consistency of gains in words read per 
minute was found, but changes ranging from loss of speed to moderate 
gains seemed to be more dependent than comprehension upon the 
original status in comprehension and rate of the individual when he 
entered college. If average in both at that time, he made the average 
gain in comprehension score and a little better than average increase in 
words read per minute. (See Table V.) If low in both at the time of 
college entrance, gains in both comprehension and speed were some- 
what better still. If high in both, gain in comprehension was slightly 
below the average gain of the entire group and increase in speed was 
negligible. 

The conclusion suggested by the findings is that students, apart 
from maturation and unless initially extremely deficient or given 
training, make rather limited improvement during the freshman year 
in their reading techniques as measured by the Iowa Test. 





1 A faster rate enables one to cover more items on the comprehension measures 
and thus would enable the originally very slow reader to improve his score on 
comprehension on the Iowa Test. 





an AY 





GENERAL CONSIDERATIONS IN THE SELECTION OF 
TEST ITEMS AND A SHORT METHOD OF ESTIMAT- 
ING THE PRODUCT-MOMENT COEFFICIENT 
FROM DATA AT THE TAILS OF THE 
DISTRIBUTION! 


JOHN C. FLANAGAN 
Codperative Test Service of the American Council on Education 


This paper is essentially a progress report concerning a series of 
studies of the procedures for the selection of test items. The present 
discussion will review briefly the arguments for the use of statistical 
criteria in the selection of test items, point out fallacies in previous 
studies and discussions of these statistical considerations, and present 
a short method of obtaining validity coefficients. 

Unless the author of a number of test items writes only items of an 
entirely uniform degree of excellence, in which case, of course, no 
improvement by any method of selection would be possible, the refine- 
ment and improvement of a test usually requires the use of empirically 
obtained statistical indices of the characteristics of the item. The 
importance of such statistical indices may be illustrated by quoting 
from a previous report by the present writer? “ .. . the reliability 
coefficient of a test composed of one hundred items having item inter- 
correlations of .15 would be .95. If another group of one hundred 
items having intercorrelations of .03 and measuring the same general 
function are added to this test, the reliability coefficient for the total 
test composed of two hundred items is slightly less than the previous 
value.” A table is being prepared which will provide a simple means 
of determining which ones of a group of experimental items should be 
included in the final form to obtain the maximum reliability coefficient 
for the finished test. The proportion of items which should be used 
would depend on the degree of excellence and heterogeneity of the 
original items. It appears quite obvious that some published tests 
would have been improved had half of the items not been included. 
The resulting increase in efficiency appears very desirable at the present 
time when so many demands are being made on the students’ time. 





1 Read before the Eastern Psychological Association, March 31, 1939, at 
Bryn Mawr, Pennsylvania. 
* Flanagan, John C.: ‘‘A Proposed Procedure for Increasing the Efficiency of 
Objective Tests.” J. Educ. Psychol., Vol. xxv, 1937, pp. 17-21. 
674 





=o meee eh O ot TS 






































Considerations in the Selection of Test Items 675 


In selecting items, there are two primary considerations: First, 
“Is the item valid?” That is, does it discriminate between persons 
having much of the quality being measured and persons having only a 
relatively small amount of this quality? This question is usually 
answered in terms of some statistical index of the validity of the test 
item. The second consideration is, ‘‘Is the difficulty level of the item 
suited to the group for which it is intended?” It has frequently been 
pointed out that items which either all students or no students get 
correct are performing no measuring function in the test. 

Although much time and energy have been spent on empirica! 
studies in an effort to discover the most satisfactory procedure for 
selecting the best items for inclusion in a final form of a particular test, 
these studies have produced conflicting results, and have conspicuously 
failed to settle the issue. A favorite type of empirical study has been 
the comparison of several methods of obtaining validity coefficients. 
Most of these studies are of little practical value because the experi- 
menters have failed to control the effect of item difficulty on test 
validity. It should be emphasized that these are two separate con- 
siderations. As will be mentioned later, items of fifty per cent 
difficulty tend in certain situations to provide the most valid test. 
Therefore, a method which combines a rough measure of validity or 
discriminating value with a device which will favor items of fifty per 
cent difficulty will tend to appear to be superior to a method which 
provides a more valid index of item validity unaffected by difficulty. 
Obviously, in a practical situation, these two factors of item validity 
and item difficulty should be given separate consideration, and an index 
which obscures the estimate of validity by combining it with a difficulty 
characteristic is to be avoided. Similarly, empirical studies of the 
effect on test validity of item difficulty have neglected the factor of 
item validity. 

In addition to these empirical studies, there have been several 
logical discussions of statistical criteria for selecting test items. 
Several investigators have brought forth logical proofs to show that a 
test should be composed of items of fifty per cent difficulty. Here, 
again, the proofs are based on assumptions which are artificial and 
fallacious in representing typical conditions. It can easily be shown 
that, to obtain the maximum amount of discrimination between the 
individuals in a particular group, a test should be composed of items 
all of which are of fifty per cent difficulty for that group, provided the 
intercorrelations of all the items are zero. This situation obviously 





4 
‘ 


~ 
. * ae 





676 The Journal of Educational Psychology 


never exists. It can also be shown that a rectangular distribution of 
item difficulties extending from the level of ability of the highest 
individual in a group to the level of the lowest individual in the group 
is necessary to obtain maximum discrimination among the members 
of the group, provided the intercorrelations between all of the items are 
unity. This situation also must be regarded as a purely hypothetical 
one. The practical situation is one which is intermediate between 
these extremes. Therefore, we may dismiss the notion that all items 
should be of fifty per cent difficulty as one based on a hypothetical 
situation which is contrary to fact. The decision concerning the most 
desirable distribution of item difficulties for a particular test should be 
based on the accuracy of measurement desired at various levels and the 
intercorrelations of the items affecting scores at these levels. 

Although the two factors which have been mentioned are usually 
regarded as the primary considerations in the selection of test items, 
they are by no means the only considerations. For example, item 
intercorrelations, although shown to be of only minor importance for 
certain types of tests, may, in particular situations, be of paramount 
importance. Some statistical methods have been devised for taking 
item intercorrelations into account, such as the approximation proce- 
dures devised by the present writer in 1934 and 1936.! However, the 
typical method of controlling item intercorrelations has been some 
relatively crude method such as the selection of items with respect to 
various logical categories. This procedure is commonly known as one 
of obtaining adequate sampling. 

Two other considerations of importance in particular situations are 
length of time required per item, and objectivity of scoring. It need 
hardly be said that a test of one hundred items requiring only twenty 
minutes of testing time would, in general, be definitely superior to one 
of fifty items requiring the same amount of testing time, even though 
the average of the item validities in the first case was somewhat lower. 
Objectivity of scoring is almost entirely a practical consideration, but 
becomes of great importance in certain situations. 

The final pair of criteria which will be mentioned are illustrated by 
such competitive examinations as those given by Civil Service Com- 





1 Flanagan, John C.: Factor Analysis in the Study of Personality. Stanford 
University Press, Stanford University, California, 1935. 

Flanagan, John C.: ‘‘A Short Method for Selecting the Best Combination 

of Test Items for a Particular Purpose.” Psychol. Bull., Vol. xxxim, 1936, pp. 

603-604. 





sea can owd S&S 


ao 
— 


erTreaeoeonw7secé & & 


2a2oe2z s. 


tl 





































Considerations in the Selection of Test Items 677 


missions. These criteria are ‘‘face-validity” and defensibility or 
authoritativeness of the correct answer. By “‘face-validity” is meant 
the requirement that examinations appear to measure what is popularly 
understood from the title. By defensibility or authoritativeness is 
meant the ability of the examiners to convince interested parties that 
the answer given by the examiners is the correct answer. Although 
these are definitely secondary considerations from the measurement 
point of view, they frequently are of great practical importance. 

Although it is clear that these secondary considerations can rarely 
be neglected, the major considerations in most situations remain those 
considered above; namely, item validity and item difficulty. 

It follows from the definition given above for item validity that the 
best index of validity is one which provides an index of the extent to 
which an item will predict the criterion. Such an index is provided 
by the product-moment correlation coefficient and its various modifica- 
tions. The most common situation is one in which the biserial correla- 
tion coefficient applies. These coefficients have been widely used and 
various procedures have been developed for reducing the relatively 
' large amount of time and effort required for their computation. Even 
though these procedures have materially decreased the labor involved, 
many individuals feel that the time expended in obtaining these 
coefficients is frequently not justifiable. Such individuals have made 
considerable use of the upper- and lower-groups method. T.L. Kelley 
reported a number of years ago that, if upper and lower groups were to 
be used, the certainty with which the means of the upper and lower 
groups are differentiated is a maximum when the two tails of the normal 
distribution each contain twenty-seven per cent of the cases. Kelley’ 
has recently amplified this statement showing that in certain situations 
a slightly smaller proportion of all cases appears desirable. He con- 
cludes, however, that ‘‘ Upper and lower groups consisting of twenty- 
seven per cent from the extremes of the criterion score distribution are 
optimal for the study of test items, provided the differences in criterion 
scores among the members of each group are not utilized.’’? 

The upper- and lower-groups method has been quite extensively 
used in connection with a chart which provides a graphic indication of 
the separation, in terms of quarter-sigma units, of the means of the 
upper and lower groups on the particular item. In many situations, it 


1 Kelley, T. L.: ‘‘The Selection of Upper and Lower Groups for the Validation 


of Test Items.”” J. Educ. Psychol., Vol. xxx, 1939, pp. 17-24. 
2 Tbid., p. 24. 





i Ne sme 





‘ eee 





‘ 
ver 


Vea 
+ 2 





678 The Journal of Educational Psychology 


VALUES OF THE PRODUCT MOMENT COEFFICIENT OF CORRELATION 














INSSSS \Y 
EANSSS Ni 
ANNSNRN ST 
ARS 
ACARI 
ENS oO 


oS 























oe 
i 


y Iz 


) 


— 





Considerations in the Selection of Test Items 679 


appears advantageous to have the index in terms of the degree of 
relationship shown between the item and the criterion. For example, 
such coefficients facilitate thinking in terms of the relation between 
item validity and test validity. The writer, therefore, has prepared a 
chart based on Tables VIII and IX in Tables for Statisticians and 
Biometricians, Part II, edited by Karl Pearson. Pearson’s tables give 
volumes of the normal bivariate surface included in any cell whose 
lower limit is 0.0, 0.1, 0.2, . . . 2.6 standard deviations, and whose 
upper limit runs to infinity, for specified correlations at intervals of 
.05 from —1.00 to +1.00. 

The derived chart (Fig. 1) shows the values of the product-moment 
coefficient of correlation corresponding to given proportions of success 
in the upper and lower twenty-seven per cent of the criterion group. 

A project to determine the standard error of a correlation coefficient 
obtained by the use of this and similar charts is now in progress. It is 
clear that such a chart, utilizing as it does the information from only 
about half of the cases and lumping these cases together into only two 
groups, will give much less accurate results than does the more usual 
biserial correlation coefficient. However, the results obtained from 
this chart have been found to be satisfactory approximations to the 
biserial coefficients in the comparisons which have been made by the 
writer. 

In practice, it appears that frequently it is satisfactory to use the 
values obtained from this chart together with an index of difficulty 
found by averaging the difficulties for the upper and lower groups. 

In conclusion, it should be mentioned that such a procedure as that 
just described provides a very rapid method of obtaining difficulty and 
validity indices for items when the tests have been administered with 
answer sheets for the International test-scoring machine. The item- 
analysis unit tabulates the number of correct responses to ninety items 
for a group of one hundred papers in about fifteen minutes. Thus a 
simple item analysis of the type herein described for about four hundred 
cases on a one hundred fifty item test would require only a couple of 
hours after the tests had been scored. The scoring itself should 
require a little more than an hour under these circumstances. 


SUMMARY 


Most recent discussions of the various techniques which have been 
proposed for the selection of test items have contained major fallacies. 
Empirical studies have, in general, overlooked the fact that there are 








680 The Journal of Educational Psychology 


two very important considerations involved in any selection of test 
items. The first consideration is item validity or discriminating 
power, and the second one is item difficulty. The empirical selection 
of items by means of a single index of item validity overlooks the 
necessity for separate consideration of these two criteria, and any 
comparison of methods on this basis is, therefore, of only trivial value. 

Logical discussions concerning the optimum distribution of item 
difficulties have invariably overlooked the very important bearing of 
item validities on this function. Therefore, these discussions may also 
be regarded as having little practical significance. 

In certain special situations, secondary considerations such as item 
intercorrelations, length of time required for a response, objectivity of 
scoring, ‘“‘face-validity,’’ and defensibility or authoritativeness are 
important factors in determining which items should be chosen. 

A short method of obtaining item validity and item difficulty 
indices was presented. The chart utilized in this procedure was based 
on Kelley’s finding that upper and lower groups containing twenty- 
seven per cent of the cases were optimum for certain related estima- 
tions. Pearson’s Tables for Statisticians and Biometricians was used in 
obtaining the charted values. 

Though the method does not in any way depend on any particular 
type of item or method of scoring, its brevity is probably best illus- 
trated by stating that within three or four hours of the time four 
hundred tests of one hundred fifty items are received in the office, a 
simple item analysis including the necessary scoring and checking may 
be obtained if the International test-scoring machine is used. 


oo 2 oF 


—=mw OF BS’ ff Cc 


go f2 


1o Qa & 


| 


THE CALCULATION OF TEST RELIABILITY 
COEFFICIENTS BASED ON THE METHOD 
OF RATIONAL EQUIVALENCE 


M. W. RICHARDSON AND G. F. KUDER 
The University of Chicago 


The authors have previously published a theoretical paper in which 
several new formulas for the estimation of test reliability were derived.' 
The present paper is partly in response to suggestions from various 
persons that the actual computations be explicitly outlined. In this 
way it may be possible to show that at least three of the four formulas 
are feasible in ordinary testing practice. One of the more precise 
formulas requires less time than does the Spearman-Brown split-test 
technique which has become almost a ritual among testers. Finally, 
this paper gives us an opportunity to present further results. Such 
empirical results cannot, however, furnish proof or disproof of the 
theoretical bases of the formulas. They do help to describe the opera- 
tion of the formulas on tests which do not meet the assumptions fully. 

Perhaps it should be said at the outset that the lack of satisfaction 
by a test of the special assumptions basic to the theory is not peculiar 
to these formulas, nor to test theory, for that matter. For example, 
the split-test Spearman-Brown technique assumes equal standard 
deviations of the two halves, and also implicitly assumes that the 
correlation coefficient between the two halves is representative of the 
many different coefficients that could be obtained if the test were 
halved in different ways. The particular way of splitting the test that 
is adopted in any given situation determines the value of the reliability 
coefficient that will be obtained. The particular split may not select a 
representative value from the many different estimates possible. The 
lack of uniqueness of a split-test estimate, plus the fact that the 
standard deviations of the two half-tests are not often equal, operate to 
make the method rather unsatisfactory in practice.” 

Although the theory of the new reliability formulas will not be 
repeated here, it is desirable to state that the reliability coefficient is 
defined as the coefficient of correlation between one experimental form 
of a test and a hypothetically equivalent form. Equivalence is 


1 Kuder, G. F., and Richardson, M. W.: “‘The Theory of the Estimation of 
Test Reliability.” Psychometrika, Vol. 11, 1937, pp. 151-160. 
2 It should be noted that, in the authors’ opinion, these formulas apply no 
better to time-limit tests than does the split-test Spearman-Brown method. 
681 








’ 


o 

ig 
ea 
£ 

: " 

fF: 
4% 

1 

: 
‘ter * 
iy 
oer 
7 t 


So ont 


“s CIE 





682 The Journal of Educational Psychology 


precisely defined in terms of the items or elements of the test. The 
departures from exact equivalence are rationally defined, and are not 
dependent upon the experimenter’s inevitable failure to construct two 
test forms which are closely equivalent. 

The’more exact formulas require more information than is ordina- 
rily provided in the analysis of a test score distribution. None of the 
methods requires a rescoring of the test by halves or otherwise. All of 
them require the computation of the test-score variance (square of the 
standard deviation). In any event, the standard deviation (and 
variance) and the mean are always computed as a part of the descrip- 
tion of the tested population. 

The four formulas listed below are assigned numbers to correspond 
to those in the original paper. They are, in order of their theoretical 
exactness: . 




















2 > Sr? 2— Zpq\" 
Te = = oc8 ry + Zee + (« Qoe m2) ; (8) 
Tu = a.” — 2pq . (2 \ pq)’. (14) 
(ZV pq)?—- Zpq 
2_ > 
Ts = re = i ae a, (20) 
yo 
ru = = 2 77 ah. (21) 
Explanation of symbols: 


rr: is the reliability coefficient of test ¢. 

o,? is the variance (square of the standard deviation of the scores 
on test ?). 

p is the percentage of correct answers given to a test item. 

q = 1— p = percentage of incorrect answers given to the test item. 

pq is the variance of a single item. 

2pq means the sum of the item variances, or sum of the products 
of p and gq, for all the test items. 

riz is the coefficient of correlation between any item 7 and the test ¢. 

=r*;pq means the sum of the products of the square of each item- 
test. coefficient by the corresponding item variance, the summa- 
tion being made over the items. 

=~/pq means the sum of the standard deviations of the n items. 

n is the number of items. 




































Calculation of Test Reliability Coefficients 683 


p is the average percentage of correct answers given to an item, as 
computed by dividing the mean by the number of items. 
; It should be noted that Formula (8) requires an item analysis which 
provides an item-test coefficient for each item and the percentage of 
correct answers on each item. Formulas (14) and (20) require only 
the percentage of correct answers on each item, in addition to the 
standard deviation. Formula (21) requires only the mean, standard 
deviation, and the number of items. It is to be noted that Formulas 
(14), (20), and (21) do not require the computation of correlation 
coefficients between two sets of scores. | 
It may be instructive to indicate in a table the requirements of the Bs 
various methods of estimating reliability coefficients. The cross | 
denotes that the operation is required by, or implicit in, the method. ' 
The analysis furnished by Table I is pertinent to the question of the 


’ ee WwW —— . 














TaBLe I 
, 
Num- Operation required, hg ee oe 1 eo tol 
Sar pearman-| lent forms 
) ber description s | 141 201/21] Brown nated 
) 1 | Construction of an equiva- a 
lent form rs ee ee x ' 
2 | Computation of mean score | ! 1 a x x 
3 |Computation of standard| X|x|x|x| xX x 
deviation of test scores 
4 | Computation of coefficient of | X 
correlation between each 
item and the total test 
5 5 | Computation of percentage} X | X | X 
of correct answers on each | 
item } 
6 | Rescoring split halves of aj] ..| ..|..].. = tia 
test 
7 | Substitution in a special for-| X | X | X | X xX + 
5 mula, after other operations 
are performed ; 
) 8 | Computation of a coefficient; .. | ..| .. | °. x x : 
P of correlation between two ; 
‘ sets of scores 


























1 Incidentally computed, but not used directly in the formula. 


amount of labor involved in making an estimate of the reliability 
coefficient, neglecting for the moment differences in accuracy of the 


4 


2 ed 
Ro 


RRR ee 


« 
‘ * 
7 ’ 7 
% = 
a Se 
, - 
» . 
“) 
o% 
f 5 
> : 
- - 
+ ¥ 
ae) 
ae oo. « 
» +) 
i.e d 
4 ‘ 
> 4‘ 
; 
bE 4 ' 
i Pate 
t ¥ 
pa 
a 
ae 
+ 4 
‘ 
} 
Te 
A a aS 
. Rae 
i i, 
W 
} Br 
ek 
ig 
> 
te 
i 
is 
‘ ee 
an’ 
i. 6 ms 
. ee, 
‘3 \ aa 
a. te 
Vi 
é Ae he 
=a fs 
» : ’ 
Ue ee 
7 } 
. 17 
ie: 
As 
4 
; 
5 45 
4 
} 
? a] 


Fee same aD ee mae eS 
‘ 





Seki 


ty eg ag 


684 The Journal of Educational Psychology 


estimate. The problem of efficiency in getting the estimate then 
resolves itself into the relative time spent in operations 1 to 8 respec- 
tively. Some conclusions emerge from an inspection of the table and 
from experience common to test technicians. One of them is that 
Formula (8) requires more computational labor than any other in the 
list. A conclusion from the experience of the authors is that the 
results given by Formula (8) are so closely approximated by Formulas 
(14) and (20) that the additional labor is not justified in the normal 
situation. It is, of course, recognized that additional labor may be 
expended if it is sufficiently important for the purpose of the investiga- 
tion that an extremely high order of accuracy be obtained. The 
formulas used in the Method of Rational Equivalence tend to give 
slight underestimates of the ‘‘true” value of the reliability coefficient. 
The authors believe that it is better to overestimate the relative 
amount of measurement error than to underestimate it. This is 
equivalent to preferring an underestimate to a fluctuating estimate 
of the reliability coefficient. 

The latter requirement of setting the upper limit of the relative 
amount of error present in a test score distribution is met by Formula 
(21). This formula will in most cases underestimate, and will never 
overestimate, the reliability coefficient; 7.e., it will overestimate the 
percentage of variance which is error. Moreover the computation of 
this coefficient requires little labor. ‘The minimum possible parameters 
used in the description of a test group provide all the data needed. 
Formula (21) may be regarded as a foot-rule method of setting the 
lower limit of the reliability coefficient, or the upper limit of error. 

With the understanding that Formula (21) ordinarily furnishes an 
underestimate of the reliability coefficient, it becomes pertinent to 
inquire into the matter of the relative amount of labor involved in the 
application of Formulas (14) or (20) and the usual techniques. Com- 
paring either of these formulas against the split-test technique, it is 
apparent from the table of necessary operations that the computation 
of the percentage of correct answers for n items must be compared with 
the rescoring of the halves of the split test and the computation of the 
coefficient of correlation between the scores on the halves. The 
authors have made only preliminary accountings of the time required 
for each, but are nevertheless of the opinion that the straightforward 
applications of Formulas (14) or (20) will in many cases require less 
time than does the traditional split-test Spearman-Brown technique. 
The cardinal fact about a single estimate by the split-test technique 


SEE 


at 


28s 


th 


cc 





ll —eSe tS me — ll 


Aw \e atl atl ae ' “ 


Se _ 


Av . 


Calculation of Test Reliability Coefficients 685 


is that no one (not even the investigator) can be sure of the direction 
in which the estimate errs; it is commonly assumed that it over- 
estimates the reliability of a test. 

Considerations which are pertinent to the amount of labor involved 
in computing the value of p for each item, as required in Formulas (14) 
and (21) are (a) that the latest machine scoring methods! provide easy 
methods of counting the percentage of correct answers for each item, 
and (b) that facilitating tables are available for the computation of 
«/pq, and pq.? 

Table II illustrates in abbreviated form the method of calculating 
the reliability coefficient by Formulas (8), (14), and (20). The entire 
table must be filled out for Formula (8). Only certain selected 
columns, as indicated below, need be obtained for Formulas (14) and 











Tasiz II 
A B CG D E F G 
Item 
number re re rpg P Pq V Pq 

1 0.235 0552 0036 930 0651 2551 
2 0.312 0973 0144 820 1476 3842 
3 0.175 0306 0025 910 0819 2862 
4 0.116 0135 0020 815 1508 3883 
90 0.057 0032 0004 860 1204 3470 
Free Pee 1.3678 gaat 17.7914 | 39.6092 























(20). The footings of the table are the sums of the respective columns. 
The data comprise a ninety-item test of aptitude for the physical 
sciences. As computed by the usual methods, M; = 50.820 and 
Cg Ps = 148.0276. 

To solve by means of Formula (8) we use the sum of columns D and 
F with o,7._ By this formula 


rq = 148.0276 — 17,7914 mer 1.3678 + (aa — 17.7914 
4 2 X 148.0276 148.0276 * 2 X 148.0276 














1 The reference here is to the International Business Machines 1939 model of 
their scoring machine. 

* Facilitating tables and nomographs suitable for the new formulas have been 
prepared by Max D. Engelhart and Hugh Lewis. 

























Riles A. : ® 





686 The Journal of Educational Psychology 


The result is 0.890, given here to three decimal places for sake of 
comparison. 

To solve by means of Formula (14), we use the sums of columns F 
and G only, with o,?._ By this formula, 


148.0276 — 17.7914 (39.6092)? 
~ (89.6092)? — 17.7914 ° 148.0276 


To solve by means of Formula (20), we use only the sum of column 
F, together with o,2 and n. Substituting these values, we have 


»., = 90 148.0276 — 17.7914 
pe hs 148.0276 


The solution according to Formula (21) does not require any of the 
data in Table II, but only n, M;, and o,?. We have 





= 0.890. 


= 0.890. 





Hence 


r, = 20 148. 0276 — 90(0.5647) - (1.0000 — 0. 5647) 
edhe: i ? 148.0276 


which equals 0.860. This estimate is less by 0.03 than that given by 
the more exact formulas. Although this reliability coefficient may be 
computed in five minutes of time from the mean, standard deviation, 
and the number of items, its accuracy is often high enough for practical 
purposes. 

Reliability coefficients of the same test computed by the split-test 
Spearman-Brown method are here presented with the results obtained 
from the four formulas above, for purposes of comparison. 





Odd versus even items, as printed......................... 0.908 
Halves with balanced difficulty........................... 0.902 
MRI Oe a ee RR 0.894 
ee ATL Oy Beg are’ oes vbw BEC lO ek oh odin eweee eel 0.890 
NS Fa i aie niece sk vo s.d.ok «2 6 ae ai ee eee whe 0.890 
A) hl Ree US gr el or ea 0.890 
a i te evkdb eet 0.860 


The following estimates of reliability obtained by use of the various 
formulas from reading tests administered in the Chicago schools to 
children in Grades II, III, and IV have been kindly furnished by 
Dr. Max D. Engelhart: 


Calculation of Test Reliability Coefficients 687 


RELIABILITY 
SS IRR RR SPE 6g A ea ai alg St di tad «> .987 
NS ood uss 2 4c SO OeS Os ed ok eba deni eeen .977 
i i ed ee i Sta 977 
a RN aa aa a a . 966 


The results obtained for the two tests reported here bear out the 
conclusion drawn from the data presented in the earlier paper, 7.e., that 
Formula (20) is adequate for most situations, producing a figure which 
is the same or slightly lower than that to be obtained by use of the most 
rigorous formula (Formula 8). 


SUMMARY 


(1) An analysis of the computational operations required for the 
reliability formulas by the Method of Rational Equivalence shows that 
the recommended formula (Formula 20) requires not more, and per- 
haps less, computational labor than does the split-test Spearman- 
Brown method. 

(2) Further empirical results show that the values computed by the 
recommended formula are close approximations to those computed by 
the more rigorous formulas. 

(3) For situations in which the investigator is satisfied with an 
underestimate of reliability, the formula which utilizes the mean, 
standard deviation, and number of items is ordinarily sufficient. 








RELATIONSHIP BETWEEN INTELLIGENCE 
AND GAINS IN READING ABILITY 


CONSTANCE M. McCULLOUGH 
Western Reserve University 


Previous studies pertaining to remedial work in reading have 
exhibited a substantial relationship between reading ability and 
intelligence as measured by standardized tests. Ranging from coeff- 
cients of .39 to .90, the correlational findings of these studies have 
centered about .50 to .70.!_ Such findings have prompted practical 
workers in the field to base their selection of students for remedial 
classes chiefly on the mental age and reading age. If the mental age 
exceeded the reading age, it was thought probable that the student in 
question was capable of improvement. If, however, the reading age 
was equal to the mental age, or nearly so, it was assumed that attempts 
to improve the student would be futile. 

The fallacy in the above reasoning lay in the assumption that the 
existence of a positive relationship between mental age and reading age 
was an assurance of a similar relationship between mental age and the 
improvability of a student in a remedial course. While research on the 
elementary-school level has suggested that intelligence is of importance 
to reading improvement, two studies in the past three years have given 
consistent evidence that, within the range of intelligence test scores 
concerned, the factor of mental ability plays a negligible part in the 
progress made in corrective reading programs on the high-school and 
college levels. 

During the second semester of the school year 1935-1936 at the 
Edison High School in Minneapolis, a class of ninth-grade students 
whose Kuhlmann-Anderson Intelligence Test scores indicated intelli- 
gence quotients ranging from 80 to 157, was organized for corrective 
work in reading comprehension. For nine weeks, from February to 
April, 1936, the six girls and eighteen boys met five class hours a week 
for corrective instruction. Exercises for the improvement of reading 
comprehension, the reading of easy, interesting books, and individual 


conferences were included in these periods. Different forms of the © 


New Stanford Silent Reading Test and the Traxler Silent Reading Test 
were given at the beginning and at the end of the nine weeks of training. 





1 Strang, Ruth: Problems in the Improvement of Reading, The Science Press, 
Lancaster, Pa., 1938, p. 304. 


- 4a0 oe 


a OcCmra ewe es& <o 


Intelligence and Gains in Reading Ability 689 


At the conclusion of the experiment the average student in the 
group was reading 1.1 grade levels better according to these tests than 
he had been before training. Of the group, three boys had reading 
ages which exceeded their mental ages. These boys made more than 
average reading gains, their improvement being more than one grade in 
each case, as shown in Table I. The relationship between intelligence 
and reading improvement on the Traxler Silent Reading Test for the 
entire group of twenty-four students was .00 by the Pearson Product 
Moment Method of Correlation. These data in the Edison High 
School experiment suggested that further study of the function of 
intelligence in reading improvement be made. 


TaBLB I.—ReEcorps or THREE Bors WHose Reapinc Aces ExcerpEep THEIR 
Menta Aces AccorDING TO STANDARDIZED TESTS AT THE EpIson HiGH 
ScHOOL In MINNEAPOLIS 














Kuhlmann- ; , 

Chrono- | Anderson Intelli- pyres a Mean cnt 

. ; on ini provemen 

Pupil — gence Test New ‘cise 
MA IQ Stanford grade! 
Dyeerees scene. tt 14-0 15-6 111 15-8 1.5 
OE 14-9 138 |. 88 13-9 2.0 
URE es sc en ss > 0 15-2 13-4 88 13-8 1.3 
Median for class... .. 14-5 15-6 |; 105° 13-5 : 1.1 




















1 Mean based upon improvement on Traxler and New Stanford Tests. 


In September, 1938, a remedial English course was initiated at 
Hiram College through the aid of the Carnegie Corporation of New 
York. Its purpose was to improve students in study habits, in oral 
and written expression, in reading speed and comprehension, and thus 
in the work of the intensive courses which, under the Hiram study plan, 
are taken singly and daily for periods of nine weeks each, while a 
running course, in this case remedial English, meets three times a week 
throughout the year. Forty-nine students, nine girls and forty boys, | 
were selected for the course on the basis of their performance on the 
Iowa Silent Reading Test, Form A, the Purdue Placement Test in 
English, Form A, and an autobiographical essay written extemporane- 
ously during Freshman Week. 

The. members of the remedial English course were given other 
diagnostic tests, including the Booker Silent Reading Test for College 











xf es 
BaP ee hrg aie eae 
ee = 


ye Be r 
POL a PL TO 8 
eek el em eee 


4d 
i . 


690 The Journal of Educational Psychology 


Students, Form 4. The California Test of Mental Maturity was 
administered as a check on the American Council Psychological Test, 
which had been taken by the entire freshman class on entrance, and 
which had shown the forty-nine remedial cases to range from the 
0.5 to the 75th percentile on the national norms. These two tests of 
intelligence were chosen for their provision of separate verbal and 
non-verbal scores. 

For ten weeks the students met three times a week for reading 
exercises of the study type, for discussion and written work on topics 
of interest to them, and for vocabulary and comprehension exercises 
from the textbooks of the intensive course. One individual conference 
each week (one-half hour in length) provided special assistance to each 
student according to his needs. 

At the end of ten weeks’ training the students were examined by 
different forms of the English tests which had been administered in 
September. Substantial gains were evidenced in every major phase 
of the work. By an average increase of twenty-two points on the 
Purdue English Test, the average individual in the class raised his 
status from the 39th percentile for college freshmen to the 66th. The 
mean total score on comprehension in the Iowa Silent Reading Test 
was raised from the 13.0 to the 13.4 grade level, while the mean rate 
score was increased from the 9.5 to the 11.7 grade level according to the 
national norms for that test. 

Because the gains effected in the ten weeks’ course compared favor- 
ably with those in similar remedial courses elsewhere and because the 
remedial group presented not only wide ranges of ability in intelligence 
and reading but also a diversity of gains in reading skill, a study of the 
relationships among reading speed, reading comprehension, and verbal 
and non-verbal intelligence was thought justified. Table II shows 
the coefficients of correlation among these various factors. 

It should first be noted that although the scores for an individual 
on the two intelligence tests fail to agree closely, the intelligence test 
scores bear somewhat similar relationships to reading-comprehension 
scores and to reading comprehension gains. . The coefficients indicate 
the presence of a relationship between verbal intelligence and initial 
reading comprehension scores. The relationship between non-verbal 
intelligence and reading comprehension suggested by the coefficients 
is meagre. The intelligence factor, verbal or non-verbal, is not 
prominent, apparently, among the factors affecting the speed of reading 
in the group studied. 


Tal 


|}2 Qo. 


“Ss Sa2BQ28BFrPesgses. FF 


5 





Intelligence and Gains in Reading Ability 


691 


TasLE II.—Corrricients oF CORRELATION EXPRESSIVE OF RELATIONSHIPS 
BETWEEN INTELLIGENCE Test ScorES AND READING IMPROVEMENT IN HIRAM 
CoLLEeGE EXPERIMENT! 











N-48 ACE. Psychol-|N-45 California Test 
ogy Test of Mental Maturity 
Non-lan- Non-lan- 
Language Lan e 
score watt ray pitied 
score score 
Iowa Silent Reading Test 
Comprehension scores 
September 1938................ .56 .27 45 14 
December 1938................. .57 .27 .49 24 
Comprehension gains.............. — .06 — .02 .09 .29 
Speed scores 
September 1938................ .29 .19 —.01 15 
December 1938. . .27 .23 .18 .10 
I ae oS bs onctcesccdecs .07 .08 .23 .O1 
California test of Mental Maturity 
(N — 44) language score.......... .28 
NII, 5.56 oc 0 occu osslanelo® odes .49 

















1 Similar correlations computed on the basis of test results for September 1938 and May 1939, 
after an average improvement in comprehension of nearly two years and an average speed gain of 
about two and one half years, yielded coefficients closely comparable to those in this tab!e. 


The teacher of remedial reading, however, is not so much concerned 
with the part intelligence has played in the student’s past as he is in the 
function of intelligence in reading improvement. The coefficients of 
correlation between intelligence test scores and reading comprehension 
gains in the Hiram College group approximate the correlation found by 
the use of different tests at the Edison High School in Minneapolis. 
According to both studies there is no relationship between compre- 
hension improvement in such courses and intelligence scores. The 
same may be said in the Hiram study for intelligence scores and speed- 
of-reading gains. The absence of relationship between these variables 
is verified by the coefficients indicating the relationship between final 
scores in reading comprehension and speed and the same intelligence 
test scores; the September and December coefficients are practically 
identical. 

One interpretation of these data might be that in the improvement 
of reading skills on the high-school and college levels intelligence is less 
important than the combined forces of such factors as student home 





















ee 
=’ + 


692 The Journal of Educational Psychology 


and school backgrounds, physical and emotional equipment, attitudes, 
incentives, and interests. Another might be that the rdéle of intelli- 
gence in a student’s reading improvement varies so greatly from one 
individual to another on these levels that a generalization in terms of 
correlation is scarcely descriptive. From either point of view, it seems 


| reasonable to assume that, within the limits of these data, the use of 


intelligence test scores in the determination of a student’s eligibility for 
remedial work in reading is not warranted. 

The data further suggest to the teacher of remedial classes that his 
position in education is unique. His is no course through which bright 
students may pass with certain distinction and the dull bow their heads 
to inevitable ignominy. The ingredients for the improvement of 
reading are X plus intelligence, and the greater of these, the more 
elusive andthe more challenging, is X. The success of a remedial 
program depends upon the extent to which the components of X can 
be identified and controlled. 


AN ANALYSIS OF THE RESULTS OF SPEED DRILLS 
WITH THE METRON-O-SCOPE TO INCREASE 
READING RATE 


F. M. GARVER AND R. D. MATTHEWS 
University of Pennsylvania | 


Rate of reading is conditioned by a number of factors, most of which 
can be influenced by training. Among these are the various types of 
eye movements involved in visual perception. If other factors are 
kept constant, rate can be increased by making simultaneously as 
many of the following changes as possible: Decreasing the number of 
fixations per unit of material read ; decreasing the number of regressions 
per unit of material read; decreasing the average duration of the fixa- 
tions; and increasing the span of recognition. Other factors which 
may affect rate are the nature of the material read, the purpose one 
has in reading, reading habits already aequired, and the degree of 
comprehension expected. 

This study is concerned with the effects on the nature of the various 
types of eye movements from a series of speed drills with the Metron-O- 
Scope, an instrument designed to increase rate by forcing rhythmic 
reading through the use of automatic shutters that expose successively 
about a third of a line of reading material at atime. The rates of the 
exposures may be controlled by the operator so that the reader may be 
forced to read at any desired rate from about one hundred words per 
minute up to three hundred fifty words per minute. 

Factors other than eye movements which are indirectly controlled 
by the instrument were kept more or less constant by having the 
students involved in the study read merely to understand what they 
had read sufficiently well to take brief short form tests immediately 
after reading each selection, by using material of constant difficulty 
and interest appeal, and by giving no special attention to increasing the 
ability to comprehend, beyond that just mentioned to determine 
whether the student had actually read the selection. 

The students involved in the study were the members of the slowest 
sections of the seventh, eighth, and ninth grades of the Coatesville, 
Pennsylvania, Junior High School. In this school the pupils are 
classified into sections for instructional purposes on the basis of general 
ability to do the required work as determined by means of intelligence 
tests, reading tests, and teacher judgments. Prior to the experiment 
all these pupils were given Form A of the Iowa Silent Reading Tests, 

693 








694 The Journal of Educational Psychology 


and all had their eye movements photographed by means of the 
Ophthalm-O-Graph to determine the number of words read per minute 
and the nature of the eye movements. At the close of the experiment 
Form B of the Iowa Silent Reading Tests was used to see what had 
happened to their comprehending abilities. Their eye movements 
were again photographed. 

The experiment consisted of having two periods a week with the 
Metron-O-Scope conducted by their regular English teachers under the 
supervision of the authors. Each Monday and Thursday for ten weeks 
beginning with the second semester of school in February from twelve 
to seventeen minutes of the regular English period were devoted to 
reading one roll only of the prepared material and taking the tests that 
are printed on the roll at the end of the story material. This material, 
prose in nature and of such general interest as might be found in 
weekly newspapers published for use in schools, was never used as a 
basis for an English lesson. The students read it under the controlled 
conditions of the instrument and then passed on to their regular 
English work. 

In terms of means only, the seventh grade of sixteen members 
increased its rate from one hundred eighty-two to two hundred fifty- 
eight words per minute; decreased the number of fixations from one 
hundred twenty-nine to eighty-seven per one hundred words; decreased 
the number of regressions from twenty-four to eighteen per one 
hundred words read; increased the span of recognition from .81 to 
1.21 words; while the average duration of fixations moved from .27 to 
.28 of a second. All but one member of this class improved in rate 
during the ten week period. The spread for the class in initial rate 
was from one hundred ten words to two hundred eighty-two words per 
minute. In the final test the spread was from one hundred eighty- 
three to three hundred forty-eight words per minute. The average 
gain in comprehension for the class was thirteen points on the Iowa 
Reading Tests. | 

The eighth grade improved i in rate from two hundred twenty-eight 
to two hundred seventy-three words per minute; changed from one 
hundred ten to ninety-two fixations per one hundred words; reduced 
the regressions from twenty to nineteen; increased the span of recogni- 
tion from .94 words to 1.12 words. The average duration of the 
fixations remained constant at .25 seconds. Four pupils in this group 
of twenty-seven did not improve in rate. The initial spread in rate for 
the class was from one hundred forty to three hundred sixty-nine words 


| 


IlImiaosl 


Analysis of the Results of Speed Drills 695 


per minute. The final spread was from one hundred ninety-four to five 
hundred words per minute. The average gain in comprehension was 
ten points. 

Of the twenty-four members in the ninth grade class, three did not 
improve in rate of reading, but the average improvement was from 
two hundred fifteen to two hundred ninety-four words per minute. 
The number of fixations per one hundred words decreased from one 
hundred ten to seventy-seven, and the regressions from twenty to ten. 
The average span of recognition changed from .94 to 1.31 words. The 
average duration of the fixations was .27 seconds at the end of the 
period, the same as at the beginning. The class improved in compre- 
hension eleven points on the Iowa Test. The initial spread in number 
of words read per minute was from one hundred forty-five to three 
hundred sixteen, and the final spread was from one hundred ninety-four 
to four hundred. 

Of the sixty-seven students involved in the study only eight failed 
to make some increase in rate of reading as shown by Ophthalm-O- 
Graph records. In general the number of fixations per one hundred 
words and the number of regressions were markedly decreased and the 
average span of recognition increased. ‘There was also a more than 
normal average development in terms of total comprehension score as 
determined by the Iowa Silent Reading Tests. As might be expected 
the considerable increase in recognition span resulted in a fairly con- 
stant average duration of fixations. It takes time to assimilate what is 
perceived through the eye, and the larger the “eye-ful’’ the longer time 
it takes. In other words, duration of fixations are conditioned some- 
what by the size of the recognition span, and a uniform duration for a 
larger span represents improvement in duration of fixations. 

According to Ophthalm-O-Graph records of rates of reading in the 
University of Pennsylvania Reading Clinic compiled with other such 
published records the average rates of pupils in the upper grades of the 
public schools are as follows: 





NN  . cc in bwsins wee Sth | 6th | 7th | 8th | 9th | 10th| 11th| 12th 
| 
ne dees 190 | 200 | 215 | 240 | 260 | 275 | 290 | 305 





























According to the data presented above, it will be noted that the 
seventh-grade section raised its average rate from below the fifth-grade 








696 The Journal of Educational Psychology 


level to about the ninth; the eighth-grade section raised its average rate 
from a seventh-grade level to a tenth grade; and the ninth-grade section 
raised its average rate from a seventh-grade level to the eleventh grade. 
Each class section also made normal advancement in average total 
score in comprehension on the Iowa Silent Reading Tests. 

To determine the effects on comprehension from a ten weeks’ series 
of intensive drills whose sole purpose was to increase the rate of reading, 
another ninth-grade section was used as a control group for the ninth- 
grade section used in the experiment. The control group was the next 
to the slowest section of the seven sections into which the ninth grade 
was Classified. There were thirty pupils each in these two slowest 
sections of the ninth grade. In so far as possible the pupils in these 
two sections were pared as to intelligence quotients, reading quotients, 
grade placement in achievement in rate and comprehension, and sex. 
Using alf these factors as bases it was found only twenty pairs could be 
obtained that were at all comparable. In a few cases the pairing was 
a bit faulty, but not sufficiently so to destroy the significance of the 
experiment. 

The control group was given the same initial and final tests as was 
the experimental group; that is, all the pupils in the control group 
section were given both forms of the Iowa Silent Reading Tests and 
their eye movements were photographed with the Ophthalm-O-Graph 
to determine the rate of reading. Both sections had similar assign- 
ments by the same teacher in their English classes. The only differ- 
ence between the two was that the experimental group had speed drills 
on the Metro-O-Scope as a part of its regular English period for ten 
weeks. 

There were six pairs of girls and fourteen pairs of boys in the 
experiment. The ranges in ages, intelligence quotients, reading 
quotients, and reading grade achievements according to school records 
were: 











School grade levels 
CA IQ RQ 
Compre-| st 
ension 
Control group........... 15-4 to 17-10 |64 to 112 |60 to 95 |6.2 to 8.9/3.0 to 8.5 
Experimental group...... 14-1 to 18-8 (61 to 108/69 to 92/5.2 to 9.2/3.0to 12.0 























=| 


>rad 


mam ww 2a 


. et ae) «66 


ta tte ft FR sie 





. _! ——_ way = 


\e 


we ws 


Analysis of the Results of Speed Drills 697 


Initial and final tests for rate for each pupil in the experiment were 
made by photographing the eye movements and calculating the number 
of words read per minute. The Ophthalm-O-Graph instrument was 
used for this purpose as it gives a more reliable rate result than do the 
various standard tests that give rate measures, and, since the reading 
material used in it is well graded and is constant in form and difficulty 
for both the initial and the final tests, it is more reliable than a timed 
word count would probably be. Form A of the Iowa Silent Reading 
Test was used as the initial test of comprehension, and Form B of the 
same test for the final test of comprehension. 

The significant data in terms of averages concerning the experiment 
are indicated below: 














, Comprehension score 
Rate—words per minute on tian tek 
Initial | Final | Pier | rnition | Final | Differ 
ence ence 
Control group............. 244 263 19 108 115 7 
Experimental group....... .| 205 295 90 94 104 10 




















The experimental group increased its rate of reading nearly five 
times over that of the control group. Of course, it had further to go 
since it started at a much lower rate, but it finished at a rate thirty-two 
words more per minute above the control group. There was no loss 
in comprehending ability. Both groups gained in ability to compre- 
hend over the ten-week period, but the difference between the gains is 
so small as to be insignificant. 


CONCLUSIONS 


(1) In a short period of ten weeks with only two fifteen minute 
periods per week slow junior high school pupils can be brought up to 
grade in rate of reading by the use of the Metro-O-Scope as an instru- 
ment for controlling eye movements. 

(2) Improvement in rate of reading comes from decreasing the 
number of fixations and the number of regressions per unit of material 
read, and from increasing the span of recognition. As the span of 
recognition is increased, the average duration of fixations tends to 








698 The Journal of Educational Psychology 











remain constant, probably due to the fact that a longer span requires 
a longer time for assimilation. 

(3) Speed drills under controlled conditions as exist with the use 
of the Metron-O-Scope for improving rate of reading do not result 
in a decrease in the ability to comprehend what is read, as is frequently 
claimed. 


=“ 09 


en’ & 


fi 
t’ 


a A ot 4A et et OD 


i i—_ kk. tPF tll CC 


—w 

































ACHIEVEMENT RATIOS OF COLLEGE STUDENTS 


PHILIP H. DuBOIS 
t The University of New Mexico 


The positive correlation between scores on psychological tests and 
grade point averages of college students, reported by Segel' to range 
from .33 to .62 with a median of .44 in the case of the American Council 
on Education psychological examination, shows that in general the 
brighter students are those who achieve the most, when grades are 
taken as a measure of achievement. 

The purpose of the present study was to investigate the matter 
further so as to determine what students achieve most in proportion to 
their ability as measured by the psychological examination. 

The subjects in this investigation were three hundred eighty-five 
students in the lower division of the College of Arts and Sciences of the 
University of New Mexico for the first semester of the academic year 
1937-1938. Students who were registered for less than seven credit 
hours were eliminated and only fifteen of the three hundred eighty-five 
were registered for less than twelve hours. Eleven were registered for 
the maximum of eighteen hours. The mean number of hours carried 
was fifteen and eight-hundredths. 

The number of grade points earned by each student was computed 
by the system used by the University: Each hour of A, three grade 
points; each hour of B, two grade points; and each hour of C, one grade 
point. No grade points are given for hours of D, X or F. Grade 
points earned ranged from zero to fifty-one, with a mean of twenty 
and eighteen-hundredths. 

Grade point averages were computed in the usual manner by divid- 
ing the number of grade points earned by the number of hours carried. 
These ratios varied from zero, indicating that all grades were D or 
below, to three, indicating that all grades were A. The mean grade i 
point average was 1.338, with a standard deviation of .678. The 
median was 1.313. The curve was slightly skewed in the positive 
direction, since there was a piling up of scores at the lower end of the 
distribution. Had the system of grade points included values for 
D’s, X’s, and failures, it is possible that the distribution would have 
been more normal. The curve was multimodal, most numerous 


1Segel, D.: ‘Prediction of success in college.” U.S. Off. Educ. Bull., 1934, 


No. 15, pp. 98. 
699 . | 








700 The Journal of Educational Psychology 


averages being 1.0, 1.3, 1.5, 1.2, 1.8 and 2.0. This may perhaps be 
accounted for by the fact that most courses on the schedules were 
carried for three hours, and that for a group with a constant number of 
three-hour subjects, the distribution of grade points cannot be 
continuous. 

All students had taken some form of the American Council on 
Education psychological examination. By the use of the national 
norms the scores on editions other than 1936 were transmuted into 
terms of the 1936 Edition. The mean score was 190.97, the median 
188.68, and the standard deviation 55.08. This curve also was slightly 
skewed in the positive direction. 

The correlation between psychological examination scores and 
grade point averages was found to be .442 + .028, a finding in harmony 
with results elsewhere. 

“) 10, the 


grade point averages and the psychological scores were transmuted 
into “scaled scores,” a type of standard scores with a mean of 50 and 
standard deviation of 10. 

The next step was to compute what may be called achievement 
ratios, similar in intent to the achievement quotients used in elemen- 
tary education. Instead of using the educational age divided by the 
mental age, the scaled score of the grade point average was divided 
by the scaled score on the psychological examination. 

The achievement ratios so computed ranged from 51 to 188. The 
mean was 102.8, the median 100.34, and the standard deviation 22.88. 
Skewness was a little greater than with the grade point averages and 
psychological scores. 

Division was made into quintile groups according to the standing 
on the psychological examination with the following results: 





Using the formula for T-scores, T-score = 50 — (z = 




















Psychological examination N | Mean AR SD SDnean 
Lowest quintile............ frteeeeees 83 120.72 | 26.13 2.87 
Second quintile............ ae 73 104.26 19.44 2.28 
ia. 5 cia Se he bideas ogo gle Gece 81 102.15 17.46 1.94 
Weare @alatile.... 2. kc eee 76 94.17 17.56 2.02 
Highest quintile.........5........... 72 88.64 16.00 1.89 





D/SDaza. for differences among the quintile groups follow: 


|} mse | 


a ne 








Achievement Ratios of College Students 701 
2 3 4 5 
1 4.50 5.36 7.53 9.35 
2 71 3.32 5.29 
3 a 2.85 4.99 
4 an 2.37 

















All differences are in the same direction and are greater between 
quintiles 1 and 2, and 4 and 5, than between 2 and 3, and 3 and 4. 
Between the first and second quintiles and between all non-adjacent 
quintile groups the differences are statistically significant according 


to the commonly accepted convention. 


Similar results were observed when division was made into quintile 
groups on the basis of the achievement ratios: 








Mean 
AR N_ |psychological) SS SD... 
examination 
Lowest auimtile.................00 73 223.73 54.36 6.36 
6s os. sc cv aesan seen 80 204.10 45.48 5.09 
I cc's cc uaceceeceren 78 193.91 46.89 5.31 
es i ee 77 181.87 50.07 5.71 
Highest quintile...............006. 77 144.69 42.14 4.80 

















Again all differences are in the same direction and are greater 
between quintiles 1 and 2, and 4 and 5, than between 2 and 3, and 3 and 
4. All but one of the differences between non-adjacent groups and one 








difference between adjacent groups are statistically reliable. Reli- 
abilities of the differences (D/SDa,) follow: 
2 3 4 5 
1 2.41 3.60 4.90 9.92 
2 = 1.39 2.91 8.49 
3 1.54 6.88 
4 ape 4.98 

















The conclusion seems inescapable that the lowest achievers in 
proportion to their ability tend to have the highest scores on the 
psychological examination and vice versa. 








702 The Journal of Educational Psychology 


Since in theory the individual having the highest score on the 
psychological examination cannot have an achievement ratio above 100 
and the individual having the lowest score cannot have an achievement 
ratio below 100, there is a chance that the achievement ratios are low 
for the brighter and high for the duller because of a statistical artifact. 
To eliminate this possibility the distribution of achievement ratios in 
each quintile group of the psychological examination was tested for 
normality by applying the chi-square test for goodness of fit. P values 
were: Lowest quintile, .81; second quintile, .23; third quintile, .66; 
fourth quintile, .55; and highest quintile, .63. In no case is it definitely 
shown that the distribution of the AR’s within the quintiles is not 
normal, and, with the exception of the second quintile, the chances are 
better than even that the discrepancies from the normal curve are due 
to chance or fluctuations in sampling. 

The study leads to the following conclusions: 

(1) By dividing scaled scores it has been found possible to compute 
achievement ratios for college students. In this study the ratio 
between grade point average and attainment on the American Council 
on Education psychological examination was used. There is no reason 
why other criteria both of attainment and of ability might not be 
employed. A group of aptitude tests could be used as measures of 
capacity, and differences between successive comprehensive examina- 
tions could be used as measures of attainment. 

(2) The low correlation between scores on the psychological 
examination and grade point averages is not to be attributed merely to 
scatter of attainment at different intellectual levels. Analysis by 
means of achievement ratios shows that the superior group accom- 
plishes considerably less in proportion to its ability than the inferior 
group. Accomplishment as measured by the achievement ratio 
decreases fairly regularly as scholastic aptitude increases. 

(3) It seems plausible that a university or college as a social institu- 
tion puts considerable pressure upon the group relatively low in the 
intellectual scale, but puts little pressure on those relatively high in the 
intellectual scale. The brighter students can obtain passing grades 
without working at capacity, while duller students must work nearer 
their intellectual limit in order to meet institutional demands. 

(4) The oat me ratio technique may have practical value for 
administrative officers in identifying those students who are not 
achieving in proportion to their capacity to achieve. 





BOR ars 


ix) 
— 


PerpercrtrrarerwtdaeBepese 





a ——_ et 


eS EE? a: 


OO = we \S tv 


— 


‘me 


rr 
rt 





A NOTE ON RELIABILITY BY THE CHANCE 
HALVES METHOD! 


CECIL B. READ 
University of Wichita, Wichita, Kansas 


Practically every discussion of the determination of the reliability 
of a test describes as one of three methods the correlation between two 
halves of the test, then application of the Spearman-Brown formula to 
determine the coefficient of reliability for the test. Generally the 
statement is made that chance halves are obtained by selecting the 
odd and even items of a test; in some cases no further comment is 
made; occasionally it is stated that some other method of obtaining the 
chance halves might be used but the results would be equivalent. 

There seems to be little or no information regarding the possibility 
of a different coefficient of reliability being obtained from different 
chance halves. In an attempt partially to answer this question various 
methods of selecting chance halves were used and the corresponding 
coefficients of reliability were computed. The test used in the study 
was the Iowa High School Content examination, given as a part of the 
regular entrance program to four hundred eighty-six freshmen entering 
the University of Wichita in September, 1938. All questions in this 
test are of the multiple-choice type, five possible choices being given. 
The test consists of four hundred questions in all; of these one hundred 
ten are in Section I (English), seventy-five are in Section II (Mathe- 
matics), one hundred are in Section III (Science), and one hundred 
fifteen are in Section IV (Social Science). Each section is scored 
separately, the score being the number right. 

Time limits for each section result in many of the later questions in 
each section remaining unanswered. For this reason halves obtained 
by such procedures as taking the first and second halves of each section 
would obviously give low reliability. The following methods of select- 
ing chance halves were selected: (The first half is described, obviously 
the other half consists of the remaining items of the test). 


A Odd items 

B Even items on part I, odd items on other parts 

C Even items on parts I and II, odd items on other parts 
D Even items on parts I and III, odd items on other parts 
E Even items on part II, odd items on other parts 


1 Acknowledgment is made to the National Youth Administration for services 
of student assistants in scoring, checking, and tabulating material. 
703 








Wag: : 


704 The Journal of Educational Psychology 


F Even items on parts II and III, odd items on other parts 
H Even items on part III, odd items on other parts 
J Even items on part IV, odd items on other parts 


With these methods, coefficients of correlation and reliability coeffi- 
cients (by use of the Spearman Brown formula) were obtained as 
shown in Table I. 


Taste I.—R5ELIABILITY OF THE Iowa Hicu-scHoot ConTentT EXAMINATION 
Ustne Various Meruops or Seiectinc CuHance HaAtvss 











(Four hundred eighty-six cases) 
Reliability of test 
Method of select- Correlation 
ing chance halves| between halves Reliability iil sete 
coefficient 
A .923 . 960 .004 
B .923 . 960 .004 
C .921 .959 .004 
D .916 .954 .004 
E .925 .961 .004 
F .916 .954 .004 
H .918 .957 .004 
J . 909 .952 .004 














It is interesting to compare the reliability coefficients obtained in 
this study with that mentioned in the manual accompanying the 
examination (.95, based on fifteen hundred fifty cases). It is also 
interesting to note that the method of odd versus even items gives one 
of the highest coefficients. 

Differences between reliability coefficients obtained by various 
methods of selecting the chance halves are small. The largest differ- 
ence is between E and J; this difference divided by its standard error 
yields 1.6. Certainly there is no marked discrepancy; if results were 
carried to two figures there would be at most a difference of .01, which 
would seem to indicate that the customary procedure of selecting odd 
and even items is justifiable. However, the fact that with only a 
slight variation in this procedure differences are as large as found in 
this study suggests the possibility of further analysis by similar studies 
elsewhere. | 





\“e 


ee FY Fel UO 


BOOK REVIEWS 


W. Carson Ryan. Mental Health through Education. New York: 
The Commonwealth Fund, 1938, pp. 315. 


One of the fundamental tenets of democratic political theory is the 
necessity of an educated electorate. In our‘own democracy, which at 
least in theory has universal suffrage, this means universal education. 
During our national history there has been a steady expansion of 
educational facilities and periodic restatements of educational ideals 
and aims. At the present time all social institutions are in flux, 
education no less than the rest. For twenty years or more outside 
critics and leading educators have been pointing out the inadequacy 
of an education based on the three “‘R’s” and their academic expan- 
sions. In this period there has been increasing emphasis—at least in 
the literature—on the mental hygiene ideal in education. 

During 1935 and 1936 Ryan, with support from the Common- 
wealth Fund, visited schools from the nursery to the graduate level to 
discover how far the much written about mental hygiene principles 
were actually operating in American schools. The present volume is a 
careful and stimulating evaluation of his findings. Written in a 
lucid style, sans tables and sans charts, but sacrificing nothing of 
scholarly workmanship, this report clearly and apparently accurately 
answers the author’s own question: ‘‘How does educational practice 
today, at every level and for every type of education, square with what 
is known of mental hygiene, and what further advances can be made?”’ 

If a reviewer may be allowed to answer the author’s question, that 
answer would be: With a few notable exceptions in schools at all levels, 
and with the almost universal exception of “‘nursery school and parent 
education,” the evidence shows the mental hygiene principles are not 
greatly in evidence in Americaneducation. Whether the author would 
agree with this answer without reservation, I do not know, but it is the 
only answer I can find in his book. In four areas Ryan finds conditions 
unfavorable for, if not antagonistic to, mental hygiene attitudes in 
schools. These areas are the teaching staff, the formal school program, 
the curriculum, and the school administration. 

Because of the close association with him, the teacher’s personality 
is important in the development of the child’s personality. However, 
available information concerning the mental health of teachers and 
prospective teachers “‘is not reassuring.’”’ Furthermore, programs of 

705 | 





Nays 


706 The Journal of Educational Psychology 


teacher training are unbelievingly lacking in requirements for work in 
mental hygiene of any sort. For their adherence to “content” and 
“‘methods” courses Ryan criticizes the teacher-training institutions, 
Unfortunately he says nothing about the state licensing bodies which 
determine in a large measure what the schools of education must teach. 
The other group of professional school personnel, the administrators, 
Ryan finds mainly concerned with “‘the externals of management.’’ 

In the school program there are four well-established procedures 
that present “‘serious obstacles to mental health and sound education.” 
These are grades and promotion, recitations and home work, examina- 
tions and marks, and discipline. To these may be added the rigidity 
of the commonly existing curriculum. The author points out that 
in spite of official formulations of educational philosophy which recog- 
nize mental hygiene principles, ‘the school curriculum remains, in 
large part, excessively narrow and bookish.”’ 

It must not be inferred from the preceding paragraphs that Ryan 
is a carping critic. He gives full credit where credit is due. One 
whole chapter is devoted to progressive programs of teacher training 
and another to the valuable special mental hygiene services at present 
in use in many schools. Further, he points out that the school alone 
cannot work miracles. There must be coédrdination and coéperation 
between the family and the community and the schools. 

In the last chapter the author makes concrete proposals for the 
schools of the future. These are stated in eight propositions which are 
too long to enumerate, but the first implies all the rest, and it will serve 
as a fitting conclusion to this review. This first statement of essentials 
for achieving mental health through education says that we need 
“‘a re-facing of the educational task by school leaders and the general 
public, whereby education will endeavor to meet more fundamental 
human needs than those ordinarily dealt with in the conventional 
school.” C. M. Louttir. 

Indiana University. 


R. E. L. Faris, and H. W. DunHam. Mental Disorders in Urban 
Areas. Chicago: University of Chicago Press, 1939, pp. 270. 


This book has the distinction of being the first extensive ecological 
study of mental disease in an urban area. Some thirty-five thousand 
cases of mental disorder admitted over a thirteen year period from the 
city of Chicago to public and private mental hospitals constitute the 








Ee irk... E_E_- _—— — — tel 





Book Reviews 707 


basic data. Supplementary data for Providence, R. I., are also 
reported. 

In accord with earlier workers who found that the incidence of 
social pathology, such as crime, suicide, and delinquency, decreased as 
one proceeded from the center of a city to its periphery; the present 
authors report similar findings for mental disorders. Since the socio- 
economic status rises with increase in distance from the central area, 
this would indicate that the incidence of mental disease is inversely 
proportional to the socio-economic level of a community. This is 
perhaps true, but the differential area rates reported in this study are 
much exaggerated because of two factors which were recognized by the 
authors but were uncontrolled. One was the possible greater incidence 
of home care of the mentally ill in the peripheral and more affluent 
areas. ‘The second and more important was the inability to control for 
the relative population turnover in various areas. In the more central 
and more disorganized rooming and renting-areas, the mobility of the 
population was greater than that for the more stable, home-owning, 
peripheral areas with the consequence that the authors’ calculation of 
the mental disease rate of an area by dividing the total number of 
patients admitted from 1922 to 1934 from each area by the census 
population as of a certain day in 1930 (itself a doubtful procedure), 
resulted in artificially increasing the incidence rates for the areas with 
greater population turnover. This statistical error underlies and at 
least partly invalidates their conclusions. 

In the inter-comparison of psychoses, the principal finding was 
that while the schizophrenic patients were concentrated in the more 
central areas, the distribution for manic depressive cases was quite 
random. Because of the intimate relationship between the location 
of a community within a city and its financial status, this merely 
confirms the annual mental hospital statistical reports which for the 
past quarter century have pointed out that the socio-economic 
status of manic depressive patients is somewhat superior to that of 
schizophrenics. 

Either completely unacquainted with or conveniently forgetful of 
basic psychiatric facts, the authors interpret their inadequately con- 
trolled statistical data as demonstrating that mental diseases are 
produced by sociological factors. With no mention of the important 
hereditary studies of Kallmann, Rosanoff, Slater, Luxenburger and 
others; which have definitely established the strong hereditary basis of 
schizophrenia and manic depressive psychoses, the hypothesis is 





yor ootit 





vee 


708 The Journal of Educational Psychology 


offered that since paranoid schizophrenic cases are concentrated in 
rooming-house areas, it necessarily follows that social isolation makes 
for mental breakdown. The fact that the paranoid individuals, as a 
consequence of their personality make-up, are impossible to live with, 
and, hence, are forced to live in rooming houses is passed over lightly. 

The random distribution of manic depressive psychoses required 
greater ingenuity. The explanation offered to fit their finding was 
that there exists a causal relationship between manic depressive 
psychoses and “extremely intimate and intense social contacts”— 
which contacts may occur at all social and economic levels. Unfor- 
tunately for psychiatrists and mental hygienists, the nature of these 
contacts is not stated. In his introduction Burgess in summarizing the 
chief facts relative to mental disease that must be taken into considera- 
tion in the interpretation of the data states: ‘The financial depression 
beginning in 1929 was accompanied by little or no increase in mental 
disorders.”” Consequently the social repercussions resulting from loss 
of life-time savings, security and employment are not the critical 
“‘social contacts.’”’ Neither are the “‘social contacts” produced by 
worry and death of loved ones the vital contacts, since the World War 
resulted in no increase in either manic depressive psychosis or any other 
psychoses. 

Ecological surveys of this type are much needed in mental disease. 
However, it is hoped that subsequent studies will be more rigidly 
controlled statistically and founded on psychiatric facts rather than 
sociological bias. JAMES D. PaGeE. 

The University of Rochester. 


Victor H. Notu. The Teaching of Science in Elementary and Secondary 
Schools. New York: Longmans, Green and Co., 1939, pp. 238. 


It has been repeatedly pointed out during the past decade that we 
are living in a culture made possible by discoveries in natural, and 
especially physical, science. The results of scientific investigation 
have been accepted with little or no understanding of them by the 
people at large, and with no appreciation of the meaning of scientific 
method. This latter has been shown again and again by investigations 
on children’s and adults’ beliefs in superstitions of all sorts. Such 
extensive ignorance of science as is actually found appears inexcusable 
when it is remembered that science instruction has been an important 
part of the secondary-school curriculum for half a century. 

Noll has undertaken to evaluate the science teaching of the past 
and present and to suggest improvements for the future. Until the 


aut 





Book Reviews 709 


beginning of the third decade of the century science teaching was found 
almost exclusively in the secondary school, and was organized in 
courses in specific subjects, especially physics, chemistry, and biology. 
During the past two decades science of a simple sort has been carried 
lower and lower in the school until in some systems it is now a part of 
the program from the first grade. Analysis of the content of courses, 
their relation to the needs and interests of students, and measurements 
of achievement shows vagueness and inconsistencies which suggest that 
science teachers have not used their own methods in dealing with their 
subject. 

A second change during the last two decades has been the introduc- 
tion of general science, especially at the junior high school level. 
These courses have shown increasing enrollments while the traditional 
physics, chemistry, and biology have shown decreasing enrollments. 
In his summary of studies on the preparation of science teachers Noll 
finds that, while the universities have adequate work available the 
normal schools usually do not; while certain large cities require above 
average training most state systems do not; the science in the ele- 
mentary school is most often taught by teachers who have had prac- 
tically no work in science; and that the performance of a group of 
entering normal-school freshmen on a test in general science was 
inferior to that of a junior-high-school class just finishing this subject. 

This picture of science teaching as surveyed in the last two para- 
graphs is not particularly encouraging. However, the author offers 
proposals for methods, course content, and teacher training which, if 
followed, would of necessity result in definite improvement. For the 
teacher of science subjects and for teachers in training this book 
affords a carefully critical summary of an extensive literature. The 
classified bibliographies at the end of chapters include nearly four 
hundred entries many of which are to unpublished theses. In addition 
there are brief critical descriptions of thirty-nine tests in science 
subjects. Noll is to be commended for what must inevitably be an 
important contribution to the teaching of science. C. M. Lourttir. 

Indiana University. 


Estuer McD. Luoyp-Jonges and MarGareEtT Ruts SmitH. A Student 
Personnel Program for Higher Education. New York: McGraw- 
Hill Book Company, Inc., 1938, pp. 322. 


If this book succeeds in raising more questions than it answers, the 
authors should be satisfied in the amount of attention which it draws 








710 The Journal of Educational Psychology 


to an aspect of higher education which has not as yet received its full 
measure of consideration. | 

In the first three chapters the authors present a general discussion 
of the philosophy and organization of the student-personnel program, 
suggesting examination of the program in the light of its three aspects 
of “‘student-personnel point of view,” “‘student-personnel services,” 
and ‘‘student-personnel administration.” ‘The proposed program 
centers around the student as a complete individual possessing and 
possessed by a physical body, social drives, emotions, interests, and 
aesthetic tastes as well as a mind. To meet the various needs of the 
student, the authors admit that highly-trained specialists are required 
in all phases of his life, but maintain that the work of these specialists 
should be coérdinated in the student-personnel program. 

It is unfortunate that the length of the introductory material in 
Chapter ITI detracts from the effect of the excellent general discussion 
of principles of organization which follows. Too much space is 
devoted to presenting a confused picture of personnel programs in 
colleges and universities today. The five and one-half page list of 
college personnel officers and committees is impressive in its length 
but it is little better than a good guess as to the actual situation in the 
institutions studied. Those who have worked with college catalogues 
will recognize the danger of using lists of faculty and administrative 
staff and titles as a basis of determining even approximate functions 
performed. One is also left with the feeling that the authors, in this 
chapter, missed an important opportunity to show the relation of the 
personnel officer and his staff to the other officers and departments of 
the college. In spite of these inadequacies, it is a challenging chapter, 
stimulating consideration of a program in terms of local conditions 
rather than offering cut-to-pattern solutions to ready-made problems. 

In the last fifteen chapters, the functions of the student-personnel 
program are discussed: Selection and admission, orientation, the 
social program, counseling, discipline, educational and vocational 
guidance, financial aid, extracurricular activities, housing, health, 
religion, placement, student-personnel records, personnel office 
administration, and research and evaluation of the personnel program. 

If the reader expects more than a general statement regarding 
each of these functions, he will be disappointed, but he can hardly 
read these chapters without being led to a more thoughtful considera- 
tion of his own institution and functions in relation to the educational 
program. Further study is suggested in the bibliography which fol- 





ce 


so tre e2eetatn S&S 


—= 





S- eS SORES SE hl 


ee _ WC See Se Wwe ee wa Fee er lUcCClOClCOlUDUllCUD 


— | Ss US = & 





Book Reviews 711 


lows each chapter. The person with a background of experience in 
higher education will regret the omission of discussion of the difficulties 
as well as the advantages of certain techniques such as the training of 
sophomore counselors. He will feel unsatisfied when confronted with 
a statement of a desirable end to be gained without any indication as 
to method of achievement, as in the discussion of participation in 
extracurricular activities when the authors make the statement that 
the best method of stimulation is to make participation attractive, 
and the reader wonders how? 

On the other hand, anyone engaged in the counseling of students 
would do well to reread, from time to time, the fourteen points stressed 
in the chapter on counseling and incorporate them in his thinking and 
living. And no administrator is so good that he can afford to assume 
that the chapters on administration of the personnel office and research 
are without special messages to him. The book ends with a note that 
needs to be stressed—the need for constant evaluation of the program 
so that it may continue to be a vital, dynamic service. 

In spite of the fact that the authors, in their enthusiasm for the 
student-personnel program, seem, at moments, to take for granted a 
freely functioning ideal program without due regard to the many 
interacting and often conflicting elements and personalities in the 
college or university, the book provides, as the authors hoped it would, 
stimulation of ideas for administrators, teachers, and personnel workers 
in institutions of higher education. Ruts E. SAuxey. 

Office of the Registrar, Hunter College of the City of New York. 


Loris Carrey Mossman, The Activity Concept, an Interpretation. 
New York: The Macmillan Company, 1938, pp. 197. 


Mrs. Mossman has written a very stimulating little book describing 
both the practical and the theoretical aspects of the activity concept. 
The first one hundred pages emphasize practice and are full of inter- 
esting and stimulating ideas. Chapter I, ‘‘ Planning for the opening 
day of school,” is particularly suggestive. The teacher who wanted 
the children to work as a group and started them off the first day of 
school opening boxes, hanging curtains, arranging furniture, and 
making their school room attractive (p. 9) or one who oriented 
another class to its own tasks by having it study what other boys and 
girls of the same age were doing in other schools deserve a following. 

Mrs. Mossman’s concept of the learning process implies a pro- 
found understanding of children and their ways. This is made 








712 The Journal of Educational Psychology 


clearest in Chapter V, ‘“‘ Developing Abilities,’ where our “blind 
acceptance of mere repetition as adequate for learning a skill’’ is 
analyzed and the implications of a different practice in which meaning 
is paramount are described most adequately. The teaching of spelling, 
reading, writing, arithmetic, and numerous other “‘areas”’ is made a 
stimulating and worth-while activity if the emphasis is upon ‘“pur- 
posing” rather than upon rote memory. 

As is true of most Progressive educators Mrs. Mossman seems to 
believe that the school should accept responsibility for all aspects of 
the child’s development. ‘In times past it has been the practice of 
the school to take over and carry on the work of an agency which is 
failing or unready to carry its load” (p.31). Unlike many Progressive 
educators Mrs. Mossman is willing to discriminate among various aims 
and give a judgment regarding the major responsibility of formal 
education, to wit: ‘‘The chief work of the school is to teach people to 
meet situations by teaching them to act on their thinking.” To the 
reviewer this position implies a high degree of sanity. 

In the first five or six chapters the author made too great use of 
the pedantic practice of “‘listing.”” There are long lists of questions 
to be answered about children, communities, and schools (p. 5); 
characteristics of democracy (p. 36) ; earmarks of good learning (p. 52); 
the behavioral categories of mankind (p. 54); school activities (p. 
61); criteria for the good life (p. 72); and essentials involved in teach- 
ing (p. 114). These enumerated items were frequently good items but 
they remind the reader too often of a Nineteenth Century catechism. 

Appended to the volume is an interesting chronology of points of 
view and practices related to the activity concept. 

STEPHEN M. Corey. 
University of Wisconsin. 













































66¢29 


7 


’ 








rl tititeboe 





ltl lutitils till 





ATURTRETETTTE 





Tlauutitn 

















